Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.

Creating multi-exposure keyframe image displays with FFmpeg and ImageMagick

While I was testing visualization of some videos from the AIST database earlier today, I wanted to also create some “keyframe image displays”. This can be seen as a way of doing multi-exposure photography, and should be quite straightforward to do. Still it took me quite some time to figure out exactly how to implement it. It may be that I was searching for the wrong things, but in case anyone else is looking for the same, here is a quick write up.

The current procedure is done using a combination of two very handy command line tools: FFmpeg and ImageMagick. I would like to add it to both the Matlab and Python versions of the Musical Gestures Toolbox as well, but will need to figure that out another time.

In this example I will use a hip-hop dance video from the AIST database:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Automagically exported keyframe images.

In my search for a solution, I tried a lot of complex things. But it turned out to be super-simple to get what I wanted:

convert *.tiff -background white -compose darken -flatten keyframes.jpg

Here we use the convert function of ImageMagick to add all the exported keyframes together to one combined image:

Keyframe image display of hip-hop video.

Since the dancer was moving in more or less the same place all the time, it is quite compact. Running the same functions on another video of a contemporary dancer, on the other hand, shows some of the potential of this visualization method. Here is the video:

Which results in this keyframe display image:

Besides being cool to look at, it is also quite informative when it comes to telling what is going on in the video. You get information about the temporal and spatial movement of the dancer, although it is difficult to understand exactly when she was moving where.

Next is to also include these methods in the Musical Gestures Toolbox.

Visualizing some videos from the AIST Dance Video Database

Researchers from AIST have released an open database of dance videos, and I got very excited to try out some visualization methods on some of the files. This was also a good chance to test out some new functionality in the Musical Gestures Toolbox for Matlab that we are developing at RITMO. The AIST collection contains a number of videos. I selected one hip-hop dance video based on a very steady rhythmic pattern, and a contemporary dance video that is more fluid in both motion and music.

Hip-hop dance

The first I have looked at a couple of different files. Let us start with this one:

We can start by looking at the motion video from this. While a motion video gives less information about context, I often find them interesting to study since they reveal the essentials of what is going on.

And from the motion video we can look at the motiongrams and average image:

The horizontal motiongram reveals the repetitiveness of the dance motion, but also some of the variation throughout the different parts. I also really like the “bump” in the vertical motiongram. This is caused by the couple of side-steps he is doing midways in the session. The “line” that can be seen throughout the horizontal motiongram is cased by the cable in the back of the video.

Contemporary dance

And then I looked at another video, with a very different character:

From this we get the following motion video (wait a few seconds, since there is no dance in the beginning…):

The average image and motiongrams from this video reveal the spatial distribution of the dancer’s motion on stage. Here it is also possible to see an artifact of the compression algorithm of the video file in the beginning of the motiongrams.

I really look forwards to continue the explorations of this wonderful new and open database. Thanks to the AIST researchers for sharing!

Testing simple camera and microphone setups for quick interviews

We just started a new run of our free online course Music Moves. Here we have a tradition of recording wrap-up videos every Friday, in which some of the course educators answer questions from the learners. We have recorded these in many different ways over the years, from using high-end cameras and microphones to just using a handheld phone. We have found that using multiple cameras and microphones is just too time-consuming, both in terms of setup and editing. Using only a mobile phone is extremely easy to set up, but we have had challenges with the audibility of the speech. Before recording this semester’s wrapup videos I therefore decided to test out some solutions based on equipment I had lying around:

  • GoPro Hero 7 w/o audio connector
  • Sony RX100 V
  • Zoom Q8
  • Samsung Galaxy Note 8
  • Røde Smartlav+ lavalier microphone
  • DPA Core 4060 lavalier microphone

In the following I will show some of the results of the testing. I decided to skip the Sony camera in this write-up, because it doesn’t have the option of connecting a separate microphone.

Testing various devices in my office.

GoPro Hero 7

The first example is of a GoPro Hero 7 with just the built-in microphone. This worked much better than expected. The audio is quite clear and it is easy to hear what I am saying. The colours of the video are vivid, but the image is compressed quite a bit. The video is very wide-angled, which is super-practical for such an interview setting, although it looks a bit skewed on the edges. But overall this was a positive surprise.

Connecting a Røde Smartlav+ to the GoPro results in a very clean sound. In fact, this could have been a very nice setup, had it not been for some challenges with placing the camera. That is because the audio dongle for the GoPro is (1) bent downwards and (2) this makes it impossible to use the housing needed to put it on a tripod (as can be seen in the picture to the right). This makes it super-clumpsy to use this setup in a real-life situation. I hear rumours about a new audio add-on for new GoPro cameras, and that may be very interesting to check out.

Zoom Q8

My next device is the Zoom Q8. This is actually a sound recorder with a built-in camera, so one would expect that the audio is the main priority. This is also the case. The video is quite noisy, but the sound quality is much better than with the GoPro. Still I find that the microphone picks up quite a bit of the room. This is good for music recordings, but not so good when the focus is on speech quality.

Hooking up a DPA 4060 lavalier microphone to the Zoom Q8 definitely helps. This is a high-quality microphone, and it needs phantom power (which the Zoom Q8 can deliver). As expected, this gives great sound, very loud and clear. The downside is that it requires bringing an extra XLR cable together with the microphone and camera, since the cable of the DPA is too short for such an interview setup. I like the wide-angle of the video, but the quality of the video is not very good.

Samsung Galaxy Note 8

Mobile phones are becoming increasingly powerful, and I also had to try the camera of my Samsung Galaxy Note 8. I have a small Manfrotto mobile phone stand which makes it possible to place it on a tripod at a suitable distance. After recording I realized how much less wide-angle the phone image is than the GoPro and Zoom cameras, leaving my head cut off in the shots. This doesn’t matter for the testing here, however. The first video is using the built-in microphone of the mobile phone. I am very positively surprised about how crisp and clear my voice is coming through here. In fact, it is quite similar to the GoPro. The video quality is also very good, and clearly the best of the three devices being compared here (the Sony camera has much better video, but it was discarded due to the lack of a microphone input).

And, finally, I connected the SmartLav+ lavalier microphone to the Samsung phone. Here the sound is, of course, very similar to the GoPro recordings.


It is not entirely straight forward to conclude from this testing, but here are some of my thoughts after this very rapid and not very systematic testing:

  • Using on-body microphones (lavalier) greatly improves the audibility as compared to using built-in microphones.
  • The DPA 4060 is great, but the the Smartlav+ is more than good enough for interviews.
  • The GoPro could have been a great device for such interviews, had it not been for the skewed image and the clumsiness of the audio adaptor.
  • The Zoom Q8 is the best audio device (as it should!), but its video is too bad, unfortunately.
  • All in all, I think that the easiest and best solution is the Samsung phone with Smartlav+.

Teaching with a document camera

How does an “old-school” document camera work for modern-day teaching? Remarkably well, I think. Here are some thoughts on my experience over the last few years.

The reason I got started with a document camera was because I felt the need for a more flexible setup for my classroom teaching. Conference presentations with limited time are better done with linear presentation tools, I think, since the slides help with the flow. But for classroom teaching, in which dialogue with students is at the forefront, such linear presentation tools do not give me the flexibility that I need.

Writing on a black/whiteboard could have been an option, but in many modern classrooms these have been replaced by projector screens. I also find that writing on a board is much more tricky than writing with pen on paper. So a document camera, which is essentially a modernized “overhead projector”, is a good solution.

After a little bit of research some years back, I ended up buying a Lumens Ladibug DC193. The reason I went for this one, was because it had the features I needed, combined with being the only nice-looking document camera I could find (aesthetics is important!). A nice feature is that it has a built-in light, which helps in creating a better image also when the room lighting is not very bright.

My Lumens Ladibug DC193 document camera is red and has a built-in light.

One very useful feature of the document camera, is the ability to connect my laptop to the HDMI input on the Ladibug, and then connect the Ladibug HDMI output to the screen. The built-in “video mixer” makes it possible to switch between the document camera and the computer screen. This is a feature I have been using much more than I expected, and allows me to change between slides shown on the PC, some hand-writing on paper, and showing parts of web pages.

When I first got the document camera, I thought that I was going to use the built-in recording functionality a lot. It is possible to connect a USB drive directly to the camera, and make recordings. Unfortunately, the video quality is not very good, and the audio quality from the built-in mono microphone is horrible.

One of the best things about a document camera is that it can be used for other things than just showing text on paper. This is particularly useful when I teach with small devices (instruments and electronics) that are difficult to see at a distance. Placing them on the table below the camera makes them appear large and clear on the screen. One challenge, however, is that the document camera is optimized for text on white paper. So I find that it is best to place a white paper sheet under what I want to show.

Things became a little more complicated when I started to teach in the MCT programme. Here all teaching is happening in the Portal, which connects the two campuses in Oslo and Trondheim. Here we use Zoom for the basic video communication, with a number of different computers connected to make it all work together. I was very happy to find that the Ladibug showed up as a regular “web camera” when I connected it to my PC with a USB cable. This makes it possible to connect and send it as a video source to one of the Zoom screens in our setup.

When teaching in the MCT Portal, I connect the Ladibug with USB to my PC, and then send the video to Zoom from my laptop.

The solution presented above works well in the Portal, where we already have a bunch of other cameras and computers that handle the rest of the communication. For streaming setups outside of the Portal I have previously shown how it is possible to connect the document camera to the Blackmagic web presenter, which allows for also connecting a regular video camera to the SDI input.

More recently I have also explored the use of a video mixer (Sony MCX-500), which allows for connecting more video cameras and microphones at once. Since the video mixer cannot be connected directly to a PC, it is necessary to also add in the Blackmagic web presenter in the mix. This makes for a quite large and complex setup. I used it for one remote lecture once, and even though it worked, it was not as streamlined as I hoped for. So I will need to find an easier solution in the future.

Exploring a more complex remote teaching setup, including a video mixer in addition to document camera and web presenter.

What is clear, however, is that a document camera is very useful for my teaching style. The Ladibug has served me well for some time, but I will soon start to look for a replacement. I particularly miss having full HD, better calibration of the image, as well as better recording functionality. I hope manufacturers are still developing this type of niche product, ideally also nice-looking ones!