Method chapter freely available

I am a big supporter of Open Access publishing, but for various reasons some of my publications are not openly available by default. This is the case for the chapter Methods for Studying Music-Related Body Motion that I have contributed to the Springer Handbook of Systematic Musicology.

I am very happy to announce that the embargo on the book ran out today, which means that a pre-print version of my chapter is finally freely available in UiO’s digital repository. This chapter is a summary of my experiences with music-related motion analysis, and I often recommend it to students. Therefore it is great that it is finally available to download from everywhere.

Abstract

This chapter presents an overview of some methodological approaches and technologies that can be used in the study of music-related body motion. The aim is not to cover all possible approaches, but rather to highlight some of the ones that are more relevant from a musicological point of view. This includes methods for video-based and sensor-based motion analyses, both qualitative and quantitative. It also includes discussions of the strengths and weaknesses of the different methods, and reflections on how the methods can be used in connection to other data in question, such as physiological or neurological data, symbolic notation, sound recordings and contextual data.

Pixel array images of long videos in FFmpeg

Continuing my explorations of FFmpeg for video visualization, today I came across this very nice blog post on creating “pixel array” images of videos. Here the idea is to reduce every single frame into only one pixel, and to plot this next to each other on a line. Of course, I wanted to try this out myself.

I find that creating motiongrams or videograms is a good way to visualize the content of videos. They are abstract representations, but still reveal some of what is going on. However, for longer videos, motiongrams may be a bit tricky to look at, and they also take a lot of time to generate (hours, or even days). For that reason I was excited to see how pixel array images would work on some of my material.

First I tried on my “standard” dance video:

which gives this pixel array image:

Pixel array image (640 pixels wide) of the dance video above.

Yes, that is mainly a blue line, resulting from the average colour of the video being blue throughout the entire video.

Then I tried with one of the videos from the AIST Dance Video Database:

Which results in this pixel array image:

Pixel array image (640 pixels wide) of the dance video above.

And, yes, that is mainly a gray line… I realize that this method does not work very well with single-shot videos.

To try something very different, I also decided to make a pixel array image of Bergensbanen, a 7-hour TV production of the train between Oslo and Bergen. I made videograms of this recording some years ago, which turned out to be quite nice. So I was excited to see how a pixel array image would work. The end result looks like this (1920 pixels wide):

Pixel array image (1920 pixels wide) of the 7-hour TV production Bergensbanen

As you see, not much is changing, but that also represents the slowness of the train ride. While I originally thought this would be a smart representation, I still think that my videograms were more informative, such as this one:

Bergensbanen
Videogram of Bergensbanen

The big difference between the two visualizations, is that each frame is represented with vertical information in the videogram. The pixel array image, on the other hand, only displays one single pixel per frame. That said, it took only some minutes to generate the pixel array image, and I recall spending several days on generating the videogram.

To sum up, I think that pixel array images are probably more useful for movies and video material in which there are lots of changes throughout. They would be better suited for such a reduction technique. For my videos, in which I always use single-shot stationary cameras, motiongrams and videograms may still be the preferred solution.

Creating different types of keyframe displays with FFmpeg

In some recent posts I have explored the creation of motiongrams and average images, multi-exposure displays, and image masks. In this blog post I will explore different ways of generating keyframe displays using the very handy command line tool FFmpeg.

As in the previous posts, I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first attempt is to create a 3×3 grid image by just sampling frames from the original image. I spent some time exploring different ways of doing this. It is possible to do it with a one-liner:

ffmpeg -ss 00:00:05 -i dance.mp4 -frames 1 -vf "select=not(mod(n\,200)),scale=495:256,tile=3x3" tile.jpg

The problem with this approach, and many similar that I found by googling around, is that it samples frames with a specific interval. In the above code it looks up every 200th frame, which gives this image:

The problem is that the image only contains information about the 1600 first frames, or more specifically frames 0, 200, 400, 600, 800, 1000, 1200, 1400, 1600. I want to include frames that represent the whole video.

I see that many people create such displays by sampling based on scene changes in the video. There are two problems with this. First, it requires that there are scene changes in the video. This is usually not the case in the videos that I study, which are primarily recorded with a stationary camera in which only the “foreground” changes. The second problem with sampling one “salient” frames, is that we loose information about the temporal unfolding of the video file. From an analysis point of view, it is actually quite useful to know more or less when things happened in the video. That is not so easy if the sampling is uneven.

I was therefore happy to find a nice script made by Martin Sikora, which is based on looking up the duration of the file and use this to calculate the frames to export from the file. Running this script on the original video gives this image:

The 9 frames in the display above reveal that there is little dance in the first one third of the video file (can see the arm of the dancer enter in the third image). It also shows how the dancer moved around in the space. It is possible to get some idea about her spatial distribution, but there is little information about her actual motion throughout the sequence. I was therefore curious to try out making such a grid-based display from a history video, which actually shows some more of the actual motion.

It is possible to make (motion) history videos in both the Matlab and Python versions of the Musical Gestures Toolbox, but today I was curious as to whether it could be done simply with FFmpeg. And it turns out to be quite simple using a filter called tmix:

ffmpeg -i dance.mp4 -filter:v tmix=frames=30:weights="10 1 1" dance_tmix30.mp4

I played around for a while with the settings before ending up with these ones. Here I average over 30 frames (which is half a second for this 60fps video). I also use weight feature to give preference to the current frame. This makes it easier to follow the dancer, as the trajectories of past motion become more blurred.

Running the above grid-script on this video results in a keyframe display that shows more of the motion happening in the frames in question. This is useful to see, for example, when she moved more than in other frames.

I am quite happy with the above-mentioned, but it is not particularly fast. Creating the history video is time-consuming, since it has to process all the frames in the entire video. I therefore tested speeding up the video 8 times, using this command (the -an flag is used to remove the audio):

ffmpeg -i dance.mp4 -filter:v "setpts=0.125*PTS" -an output8x.mp4

Running the history video function on this then runs quite a bit faster, and results in this hi-speed history video:

Running this through the grid-script gives a keyframe display that is both similar and different to the one above:

It is quite a lot quicker to generate, and also gives more information about the motion sequence.

The conclusion is that it is, indeed, possible to make a lot of interesting video visualizations using “only” FFmpeg. Several of these scripts are also much faster than the scripts I have previously used in Matlab and Python. So I will definitely continue to explore FFmpeg, and look at how it can be integrated with the other toolboxes.

Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.

Creating multi-exposure keyframe image displays with FFmpeg and ImageMagick

While I was testing visualization of some videos from the AIST database earlier today, I wanted to also create some “keyframe image displays”. This can be seen as a way of doing multi-exposure photography, and should be quite straightforward to do. Still it took me quite some time to figure out exactly how to implement it. It may be that I was searching for the wrong things, but in case anyone else is looking for the same, here is a quick write up.

The current procedure is done using a combination of two very handy command line tools: FFmpeg and ImageMagick. I would like to add it to both the Matlab and Python versions of the Musical Gestures Toolbox as well, but will need to figure that out another time.

In this example I will use a hip-hop dance video from the AIST database:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Automagically exported keyframe images.

In my search for a solution, I tried a lot of complex things. But it turned out to be super-simple to get what I wanted:

convert *.tiff -background white -compose darken -flatten keyframes.jpg

Here we use the convert function of ImageMagick to add all the exported keyframes together to one combined image:

Keyframe image display of hip-hop video.

Since the dancer was moving in more or less the same place all the time, it is quite compact. Running the same functions on another video of a contemporary dancer, on the other hand, shows some of the potential of this visualization method. Here is the video:

Which results in this keyframe display image:

Besides being cool to look at, it is also quite informative when it comes to telling what is going on in the video. You get information about the temporal and spatial movement of the dancer, although it is difficult to understand exactly when she was moving where.

Next is to also include these methods in the Musical Gestures Toolbox.