Music thumbnailing

December 6, 2020

A couple of days ago, I read an interesting paper about a new AI algorithm that can summarize long texts. This is an attempt to solve the problem of tl;dr texts, meaning “too long, didn’t read”.

The article reminded me that the same problem exists for music, in which case it would probably be tl;dl: “too long, didn’t listen”. I was interested in this topic back when I wrote my master’s thesis about short-term music recognition. One way to overcome the challenge of listening through full music tracks is by creating music “thumbnails”. That is, a compact representation of the most salient parts of the music in question. This is not a trivial task, of course, and lots of research have gone into it over the years. Strangely, though, I haven’t seen any of the many suggested algorithms implemented in any commercial service (so far).

Shortening Bolero

I didn’t delve very deep into the topic, but I experimented with modifying a recording of Ravel’s Bolero in my thesis. First, I thought of doing a time compression of a piece that would preserve the overall form and some of the timbral qualities. A test of this was done by compressing Ravel’s Bolero, with a phase vocoder, from 15 minutes to 15 seconds (warning: starts soft and ends loud):

The user interface of the Max patch Music Trailer that plays short excerpts from a sound file. — 5 segments, each 3 seconds

Shortening Bolero

Salient-based thumbnailing