Frame differencing with FFmpeg

I often want to create motion videos, that is, videos that only show what changed between frames. Such videos are nice to look at, and so-called “frame differencing” is also the start point for many computer vision algorithms.

We have made several tools for creating motion videos (and more) at the University of Oslo: the standalone VideoAnalysis app (Win/Mac) and the different versions of the Musical Gestures Toolbox. These are all great tools, but sometimes it would be nice also to create motion videos in the terminal using FFmpeg.

I have previously written about the tblend function in FFMPEG, which I thought would be a good starting point. However, it turned out to be slightly more challenging to do than I had expected. Hence, this blog post is to help others looking to do the same.

Here is the source video I used for testing:

Source video of dance improvisation.

First, I tried with this oneliner:

ffmpeg -i input.mp4 -filter_complex "tblend=all_mode=difference" output.mp4

It does the frame differencing, but I end up with a green image:

The result of the tblend filter.

I spent quite some time looking for a solution. Several people report a similar problem, but there are few answers. Finally, I found this explanation suggesting that the source video is in YUV while the filter expects RGB. To get the correct result, we need to add a format=gbrp to the filter chain:

ffmpeg -i input.mp4 -filter_complex "format=gbrp,tblend=all_mode=difference" output.mp4
The final motion video after RGB conversion.

I have now also added this to the function mgmotion in the Musical Gestures Toolbox for Terminal.

Preparing video for Matlab analysis

Typical video files, such as MP4 files with H.264 compression, are usually small in size and with high visual quality. Such files are suitable for visual inspection but do not work well for video analysis. In most cases, computer vision software prefers to work with raw data or other compression formats.

The Musical Gestures Toolbox for Matlab works best with these file types:

  • Video: use MJPEG (Motion JPEG) as the compression format. This compresses each frame individually. Use .AVI as the container, since this is the one that works best on all platforms.
  • Audio: use uncompressed audio (16-bit PCM), saved as .WAV files (.AIFF usually also works fine). If you need to use compression, MP3 compression (MPEG-1, Layer 3) is still more versatile than AAC (used in .MP4 files). If you use a bitrate of 192 Kbs or higher, you should not get too many artefacts.

Many people ask me how to convert from typical MP4 files (with H.264 video compression and AAC audio compression). The easiest solution (I think) is to use FFMPEG, the versatile command-line utility. Here is a oneliner that will convert from an .MP4 file into a .AVI file with MJPEG and PCM audio:

FFmpeg -i input.mp4 -c:a pcm_s16le -c:v mjpeg -q:v 3 -huffman optimal output.avi

The resultant file should work well in Matlab and other video analysis tools. We have included this conversion by default in the new Musical Gestures Toolbox for Python. So there, you can directly load an MP4 file, which will be converted to an AVI file using a script similar to the one above.

Visualizing some videos from the AIST Dance Video Database

Researchers from AIST have released an open database of dance videos, and I got very excited to try out some visualization methods on some of the files. This was also a good chance to test out some new functionality in the Musical Gestures Toolbox for Matlab that we are developing at RITMO. The AIST collection contains a number of videos. I selected one hip-hop dance video based on a very steady rhythmic pattern, and a contemporary dance video that is more fluid in both motion and music.

Hip-hop dance

The first I have looked at a couple of different files. Let us start with this one:

We can start by looking at the motion video from this. While a motion video gives less information about context, I often find them interesting to study since they reveal the essentials of what is going on.

And from the motion video we can look at the motiongrams and average image:

The horizontal motiongram reveals the repetitiveness of the dance motion, but also some of the variation throughout the different parts. I also really like the “bump” in the vertical motiongram. This is caused by the couple of side-steps he is doing midways in the session. The “line” that can be seen throughout the horizontal motiongram is cased by the cable in the back of the video.

Contemporary dance

And then I looked at another video, with a very different character:

From this we get the following motion video (wait a few seconds, since there is no dance in the beginning…):

The average image and motiongrams from this video reveal the spatial distribution of the dancer’s motion on stage. Here it is also possible to see an artifact of the compression algorithm of the video file in the beginning of the motiongrams.

I really look forwards to continue the explorations of this wonderful new and open database. Thanks to the AIST researchers for sharing!

Calculating duration of QuickTime movie files

I have been doing video analysis on QuickTime (.mov) files for several years, but have never really had the need to use the time information of the movie files. For a project, I now had the need for getting the timecode in seconds out of the files, and this turned out to be a little more tricky than first expected. Hence this little summary for other people that may be in the same situation.

It turns out that QuickTime uses something called time units for the internal representation of time, and this is also what is output in Jitter when I run my analysis software on the files. The time unit is not very meaningful for humans, as it is a combination of frames and a timescale that defines the actual length of each time unit. Apple has posted some more technical information about Timecode Media Handler Functions, but it didn’t really help me solve the problem easily.

Fortunately, there are a few threads on this topic on the Cycling ’74 forum, including one on relating sfplay time to quicktime time, and another on converting quicktime units into time code. These threads helped me realize that calculating the duration of a movie file in seconds, is as easy as dividing the duration in frames by the timescale. And, by knowing the total number of frames it is then also possible to calculate the frames per second of the file. Now that I know this, it is obvious, but I post the patch here in case there are others looking for this information.

----------begin_max5_patcher----------
1334.3oc6Zt0iiZCE.94jeEV7TqzTJ9B25Sspu0+BUqF4.NY7rDHEblY1tZ+
uWegbcGLPVvyLRUq1DgCfOmOet4imutbg2ppWXMdfeC72fEK95xEKzCoFXQ6
0K71ReIqf1nuMurpsaYkBu6L+lf8hPO9eRKx1WPELv1pm3LP99ZpfWUBnk4f
06Z.7RvewEBV8gGccUong+uL0iCQ9AsCWtea0dQASnmuCittdyJ80GuucTQ1
C7xM2WyxDFM.GI+U.BY9LV+k7A.ep8Q34ZQsZ0i+BJ5bwnjtUKFd+QMmV3cR
R3kGDDnZrusbo5i69AYUVAO6y.QEH6HzDboDLefAgUvHQSFHhXkLn2PxbvpY
9PAJIxOINTp+QZZDCsACX7XgwgYtl0HUPsxb9rmF5qmbrdAIn8CvmlPFttVJ
lYU6O8Sy.EgQ9CGhQSLDgo5oy3tOKLT4N1H8NmQeRHQaBzRvH6DjLsDDFFXH
X7rRv4CdwomwNiuTmrC6f3YxWvpQlYC0s1EYRckn0qHhOJzh5A8N9hTN9xDr
2yJoqJXW.0IwrIE4C0xR3UQuZeLi1I9xNl4983JCCf2JZ4Fuax5ZZ4JD2F8c
PjM0cfEJKVQyUxGRthBH9CFqINDqRCV85MI8iIWiwFTZ.KL.Ykrwy.YmdDsZ
uPbp.uKYAzBKRwl51RBNDsuaRDMNc5GX8l8rb999jefUl+MJKAx.zdXfYONQ
V0ej2DsZqIycc78jWu.3mZjEtVl27yyXYWlReHw5ufPqbjj5fZGVWTIeGSVL
CoFoBRPL6+Mzt9k3hPFREjJlGql06ZlwM4DE5ehj16m.wE8SninM+J.5OJJ.
627AmpSYhQ9VR3PFabFjcSjI0z3XCnbPPEbPa3YbT5PBq.+3EVAg8OSAiisV
IBN5CRbEW3Qcfb3j98nvouC7n1xZZnaXeGU1HmfapuHVnSHlzVXioWDwZHAS
5.OjoBO2FYlVeJy17aqDoOOJz+6QcE2vCHCEF+NvepCjTKmCSi+AGcq.mZdK
3l5EdXucUpsaYo1ODffQxsvNczt6p+O0gjtw1caw9BSRuHlTRrsXRvz21XRV
PyMYAYALDR7kAqgVAS36VvL5lSOjFSzBEyNt5BJuwIv5HTztZdGtO3dJO901
gsoDQTf15n8LYgcXi3fBg2UP+xJZs2XSOm.8O0uIn8CnAebizOvyY0Oud84V
MuZxW0MVTUsyZBYUDS0Ryj5.Tyn4iZwF12QtaXjw9Wc3bu5Rcv6RS+G4B++Q
3a9iV32o6EUMBZs.DLJBg5iPPS1WXHxl6P3T02NoVc+Vpnl+xsmyUQlcOyKy
qd161T5Pj4bARM6FFR55HrN145UzrOuQVXTYdu9OudkFWno5IyqfWd8ehKZI
VM9kpeS095rCut1i.BbRjyYMBdoIi5o6QUhI.d7ljt04rxyCQlyaT0oq0nfW
ccXnhi5r9Fl7D3D4QUkXuxyUB8rKOvdjmTmINCx5I0YVOICPbTczykKVndjm
DmINwCcwxgzA2i7D6LwIb.NVWEMXNEmnArXgbJb5OJ36qEK2ET9pPJcD2wcN
5jAHNg8HMa446pj0k2bp8+X8V.izcUgnO2H8EmlnISAvCJuRjy.JNYBH5DJN
QCMOmaR6hmACtPBwkFb3gXv4vJGFxBrSkl9RTicq3zav+TmJN8UjGzcAGfoy
Pz+vTG5LBCmdMfDF6RMHXFyWX1xOc2tmX0MsuRsj3sk9XUs5xn6zWxKMWp6m
fWM6I9g6GqGgVm8.WvxD6qMsg4kjHukp44aK+Ob4vDqA
-----------end_max5_patcher-----------

My main problem, though, was that I already had a lot of analysis files (hundreds) with only the QuickTime time unit as the time reference. It was not an option to rerun these analyses (which have taken weeks to finish), so I had to figure out a way to retroactively calculate a more meaningful timecode (in seconds).

After fiddling around with the time series data for a little while, I found that it is possible to use the difference between two time samples, and since I know the original fps of the movies (using QuickTime Player to check the fps), this can be used to calculate the correct duration in seconds. For my dataset there turned out to be only five different time unit durations, so it was fairly easy to write a small script for calculating the durations in Matlab. This is the part of the script that handles the time calculation, in which a is a Matlab structure with my data, hence a.time(1) is the time code of the first sample in the dataset.

time_diff=a.time(2)-a.time(1);
switch(time_diff)
    case 1001
        t = (a.time-a.time(1))/(29.97*time_diff); % 29.97 fps
    case 2000
        t = (a.time-a.time(1))/(29.97*time_diff); % 59.94 fps
    case 3731
        t = (a.time-a.time(1))/(24*time_diff);    % 24 fps
    case 3733
        t = (a.time-a.time(1))/(24*time_diff);    % 24 fps
    case 3750
        t = (a.time-a.time(1))/(24*time_diff);    % 24 fps
    otherwise 
        t = (a.time-a.time(1))/(29.97*time_diff); % 29.97 fps
        disp('!!! Unknown timecode.')
end

For new analyses, I will calculate the correct duration in seconds right away, but this hack has a least helped in solve the problem for my current data set.

New publication: Performing the Electric Violin in a Sonic Space

I am happy to announce that a paper I wrote together with Victoria Johnson has just been published in Computer Music Journal. The paper is based on the experiences that Victoria and I gained while working on the piece Transformation for electric violin and live electronics (see video of the piece below).

Citation
A. R. Jensenius and V. Johnson. Performing the electric violin in a sonic space. Computer Music Journal, 36(4):28–39, 2012.

Abstract
This article presents the development of the improvisation piece Transformation for electric violin and live electronics. The aim of the project was to develop an “invisible” technological setup that would allow the performer to move freely on stage while still being in full control of the electronics. The developed system consists of a video-based motion-tracking system, with a camera hanging in the ceiling above the stage. The performer’s motion and position on stage is used to control the playback of sonic fragments from a database of violin sounds, using concatenative synthesis as the sound engine. The setup allows the performer to improvise freely together with the electronic sounds being played back as she moves around the “sonic space.” The system has been stable in rehearsal and performance, and the simplicity of the approach has been inspiring to both the performer and the audience.

PDF
The PDF will be available in the University of Oslo public repository after the 6 month embargo. Until then, it is available through either MIT Press or Project MUSE.

BibTeX entry
@article{Jensenius:2012,
Author = {Jensenius, Alexander Refsum and Johnson, Victoria},
Journal = {Computer Music Journal},
Number = {4},
Pages = {28–39},
Title = {Performing the Electric Violin in a Sonic Space},
Volume = {36},
Year = {2012}}

Video
Video of the piece Transformation.