Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.

Published by

alexarje

Alexander Refsum Jensenius is a music researcher and research musician living in Oslo, Norway.