Flamenco video analysis

I continue my testing of the new Musical Gestures Toolbox for Python. One thing is to use the toolbox on controlled recordings with stationary cameras and non-moving backgrounds (see examples of visualizations of AIST videos). But it is also interesting to explore “real world” videos (such as the Bergensbanen train journey).

I came across a great video of flamenco dancer Selene Muñoz, and wondered how I could visualize what is going on there:

Videograms and motiongrams

My first idea is always to create a motiongram to get an overview of what goes on in the video file. Here we can clearly see the structure of the recording:

A motiongram of the video flamenco dance recording (6 minutes 23 seconds).

The motiongram shows what changes in the frames. The challenge with analyzing such TV production recordings is that there is a lot of camera movement. This can be more clearly seen in a videogram (the same technique as motiongram, but calculated from the regular image).

A videogram of the same recording. Three segments of camera zoom/pan are highlighted.

Sometimes, the videogram can be useful, but the motiongram can more clearly show the motion happening in the files. Since it is based on frame differencing, it effectually “removes” the background material. So by zooming into a motiongram, it is possible to see more details of the motion.

A horizontal motiongram of the opening sequence. This shows the combined motion of both performers and is therefore not particularly useful.

The above illustration shows a horizontal motiongram, which reflects the vertical motion (yes, it is a bit confusing with this horizontal/vertical thinking…). When there are two performers, that is not particularly useful. In such cases, I prefer to look at the vertical motiongram instead, which shows the horizontal motion. Then it is much easier to see the motion of each performer separately, not least their turn-taking in the performance.

The vertical motiongram can be used to investigate the motion of each performer.

The motiongram can also be used together with audio representations, such as the tempogram shown below.

A motiongram (top) and tempogram (bottom) can be used to look at structural similarities and differences between the audio and video streams (the tempo estimation here is not particularly relevant, but it is part of the script).

Grid representations

The above visualizations show information about continuous motion. I often find this to be useful when studying, well, motion. However, when dealing with multi-camera productions, it is often common to look at grid-based image displays instead. Using one of the functions from the MGT for Terminal, I created some versions with an increasing number of extracted frames.

There is a trade-off here between getting a general overview and getting into the details. I think that the 3×3 and 4×4 versions manage to capture the main content of the recording fairly well.

Visualizations always need to be targeted at what one wants to show. Often, it may be the combination of different plots that is most useful. For example, a grid display may be used together with motiongrams and a waveform of the audio.

A grid display (top), motiongram (middle), and audio waveform (bottom) reveal quite a bit of the content of the video recording.

Now I put these displays together manually. The aim is to generate such combined plots directly from MGT for Python.

Published by


Alexander Refsum Jensenius is a music researcher and research musician living in Oslo, Norway.