Frame differencing with FFmpeg

I often want to create motion videos, that is, videos that only show what changed between frames. Such videos are nice to look at, and so-called “frame differencing” is also the start point for many computer vision algorithms.

We have made several tools for creating motion videos (and more) at the University of Oslo: the standalone VideoAnalysis app (Win/Mac) and the different versions of the Musical Gestures Toolbox. These are all great tools, but sometimes it would be nice also to create motion videos in the terminal using FFmpeg.

I have previously written about the tblend function in FFMPEG, which I thought would be a good starting point. However, it turned out to be slightly more challenging to do than I had expected. Hence, this blog post is to help others looking to do the same.

Here is the source video I used for testing:

Source video of dance improvisation.

First, I tried with this oneliner:

ffmpeg -i input.mp4 -filter_complex "tblend=all_mode=difference" output.mp4

It does the frame differencing, but I end up with a green image:

The result of the tblend filter.

I spent quite some time looking for a solution. Several people report a similar problem, but there are few answers. Finally, I found this explanation suggesting that the source video is in YUV while the filter expects RGB. To get the correct result, we need to add a format=gbrp to the filter chain:

ffmpeg -i input.mp4 -filter_complex "format=gbrp,tblend=all_mode=difference" output.mp4
The final motion video after RGB conversion.

I have now also added this to the function mgmotion in the Musical Gestures Toolbox for Terminal.

Pre-processing Garmin VIRB 360 recordings with FFmpeg

I have previously written about how it is possible to “flatten” a Ricoh Theta+ recording using FFmpeg. Now, I have spent some time exploring how to process some recordings from a Garmin VIRB camera.

Some hours of recordings

The starting point was a bunch of recordings from our recent MusicLab Copenhagen featuring the amazing Danish String Quartet. A team of RITMO researchers went to Copenhagen and captured the quartet in both rehearsal and performance. We have data and media from motion capture, eye tracking, physiological sensing, audio, video, and more. The plan is to make it all available on OSF.

When it comes to video, we have many different recordings, ranging from small GoPro cameras hanging around the space to professional streaming cameras operated by a camera crew. In addition, we have one recording from a Garmin VIRB 360 camera hanging in the chandelier close to the musicians. Those recordings are what I will explore in this post.

An upside 360 recording

The Garmin VIRB camera records a 360-degree video using two 180-degree lenses. Unlike Ricoh Theta’s stereo-spherical videos, the Garmin stores the recording with an equirectangular projection. Here is a screenshot from the original recording:

An image from the original video recorded from a Garmin VIRB camera.

There are some obvious problems with this recording. First, the recording is upside down since the camera was hanging upside down from a chandelier above the musicians. The panning and tilting of the camera are also slightly off concerning the placement of the musicians. So it is necessary to do some pre-processing before analysing the files.

Most 360-degree cameras come with software for adjusting the image. The Garmin app can do it, but I already have all the files on a computer. It could also be done in video editing software, although I haven’t explored that. In any case, I look for an option that allows me to batch process a bunch of videos (yes, we have hours of recordings, and they are split up into different files).

Since working on the Ricoh files last year, I have learned that FFmpeg’s new 360 filter is part of the regular release. So I wanted to give it a spin. Along the way, I learned more about different image projections types that I will outline in the following.

Equirectangular projection

The starting point was the equirectangular projection coming out of the Garmin VIRB. The first thing to make it more useful is to flip the video around and place the musicians in the centre of the image.

Rotating, flipping, panning, and tilting the image to place the musicians in the centre.

The different functions of the v360 filter in FFmpeg are documented but not explained very well. So it took me quite some time to figure out how to make the adjustments. This is the one-liner I ended up with to create the image above:

ffmpeg -i input.mp4 -vf "v360=input=e:output=e:yaw=100:pitch=-50:v_flip=1:h_flip=1" output.mp4

There are some tricks I had to figure out to make this work. First, I use the v360 filter with equirectangular (shortened to e) as both the input and output of the filter. The rotation was done using both the v_flip and h_flip commands, which rotate around both the horizontal and vertical axes. In the original image, the cellist was on the edge. So I also had to turn the whole image horizontally using yaw and move the entire image down a bit using pitch. It took me some manual testing to figure out the correct numbers here.

Since the analysis will be focused on the musicians, I have also cropped the image using the general crop filter (note that you can add multiple filters with a comma in FFmpeg if you try to add another filter, only the last one will be used):

ffmpeg -i input.mp4 -vf "v360=input=e:output=e:yaw=100:pitch=-50:v_flip=1:h_flip=1, crop=1700:1000:1000:550" output_crop.mp4

This gives us a nicely cropped video of the musicians:

Cropped equirectangular image.

This video already looks quite good and could be used for analysis (for example, in one of the versions of Musical Gestures Toolbox). But I wanted to explore if other projections may work better.

Gnomonic projection

An alternative projection is called gnomonic in fancy terminology and “flat” in more plain language. It looks like this:

A gnomonic projection of the video.
ffmpeg -i input.mp4 -vf "v360=input=e:output=flat:v_flip=1:h_flip=1:yaw=90:pitch=-30:h_fov=150:v_fov=150" output_flat.mp4

Here I used the flat output type in FFmpeg and did the same flipping, panning and tilting as above. I had to use slightly different numbers for yaw and pitch to make it work, though. Also, here I added some cropping to focus on the musicians:

ffmpeg -i input.mp4 -vf "v360=input=e:output=flat:v_flip=1:h_flip=1:yaw=90:pitch=-30:h_fov=150:v_fov=150, crop=3800:1100:0:800" output_flat_crop.mp4

This left me with the final video:

Cropped gnomonic projection.

There are many problems with this projection, and the most obvious is the vast size difference between the musicians. So I won’t use this version for anything, but it was still interesting to explore.

Cube map

A different projection is the cube map. Here is an illustration of how it relates to the equirectangular projection:

Overview of different projection types (from Sizima).

The v360 filter also allows for creating such projections. It has multiple versions of this idea. I found a nice blog post by Anders Jirås that helped me understand how this function works.

First, I tested the c6x1 output function:

ffmpeg -i input.mp4 -vf "v360=input=e:output=c6x1:out_forder=frblud:yaw=50:pitch=-30:roll=50:v_flip=1:h_flip=1" output_c6x1.mp4

I changed the order of images using out_forder (as documented here) and (again) played around with the yaw, pitch, and roll to make something that worked well. This resulted in an image like this:

A cube map projection of the video. Here with a 6×1 cube layout.

There is also a function called c3x2, which will generate an image like this:

A 3×2 cube projection.

Adding some cropping to the 3×2 projection:

ffmpeg -i input.mp4 -vf "v360=input=e:output=c3x2:out_forder=frblud:yaw=50:pitch=-30:roll=50:v_flip=1:h_flip=1, crop=1500:1080:150:0" output_c3x2_crop.mp4

Then we end up with an image like this:

A cropped 3×2 projection.

This looks quite weird, mainly because the cellist wraps into a different cube than the others.

Equi-angular cubemap

Finally, I wanted to test a new projection invented by Google a couple of years ago: the Equi-Angular Cubemap. The idea has been to create a projection with fewer artefacts on the edges:

SaturationComp
Equirectangular Projection (left), Standard Cubemap (middle), Equi-Angular Cubemap (right) (from a Google blog post).

In FFmpeg, this can be achieved with the eac function:

ffmpeg -i input.mp4 -vf "v360=input=e:output=eac:yaw=100:pitch=-50:roll=0:v_flip=1:h_flip=1" output_eac.mp4

The resultant image looks like this:

An equi-angular cubemap projection.

Only the top part of the image is useful for my analysis, which can be cropped out like this:

ffmpeg -i input.mp4 -vf "v360=input=e:output=eac:yaw=100:pitch=-50:roll=0:v_flip=1:h_flip=1, crop=2200:1200:750:0" output_eac_crop.mp4

The final image looks like this:

A cropped equi-angular projection.

The equi-angular cubemap should have better projection overall because it avoids too much distortion on the edges. However, that comes at the cost of some more artefacts in the central parts of the image. So when cropping into the image as I did above, the equirectangular may actually work best.

Summing up

After quite some time fiddling around with FFmpeg and trying to understand the various parts of the new v360 function, I can conclude that the original equidistant projection is probably the best one to use for my analysis. The other projections probably work better for various types of 3D projections. Still, it was useful to learn how to run these processes using FFmpeg. This will surely come in handy when I am going to process a bunch of these files in the near future.

Preparing video for Matlab analysis

Typical video files, such as MP4 files with H.264 compression, are usually small in size and with high visual quality. Such files are suitable for visual inspection but do not work well for video analysis. In most cases, computer vision software prefers to work with raw data or other compression formats.

The Musical Gestures Toolbox for Matlab works best with these file types:

  • Video: use MJPEG (Motion JPEG) as the compression format. This compresses each frame individually. Use .AVI as the container, since this is the one that works best on all platforms.
  • Audio: use uncompressed audio (16-bit PCM), saved as .WAV files (.AIFF usually also works fine). If you need to use compression, MP3 compression (MPEG-1, Layer 3) is still more versatile than AAC (used in .MP4 files). If you use a bitrate of 192 Kbs or higher, you should not get too many artefacts.

Many people ask me how to convert from typical MP4 files (with H.264 video compression and AAC audio compression). The easiest solution (I think) is to use FFMPEG, the versatile command-line utility. Here is a oneliner that will convert from an .MP4 file into a .AVI file with MJPEG and PCM audio:

FFmpeg -i input.mp4 -c:a pcm_s16le -c:v mjpeg -q:v 3 -huffman optimal output.avi

The resultant file should work well in Matlab and other video analysis tools. We have included this conversion by default in the new Musical Gestures Toolbox for Python. So there, you can directly load an MP4 file, which will be converted to an AVI file using a script similar to the one above.

Rotate video using FFmpeg

Here is another FFmpeg-related blog post, this time to explain how to rotate a video using the command-line tool FFmpeg. There are two ways of doing this, and I will explain both in the following.

Rotation in metadata

The best first try could be to make the rotation by only modifying the metadata in the file. This does not work for all file types, but should work for some (including .mp4) files.

ffmpeg -i input.mp4 -metadata:s:v rotate="-90" -codec copy output.mp4

The nice thing here is that it is superfast and also non-destructive.

Rotation with compression

If the above does not work, you will need to recompress the file. That is not ideal, but will do the trick. To rotate a movie by 90 degrees, you can do:

ffmpeg -i input.mp4 -vf "transpose=1" -c:a copy output.mp4

The trick here is to use the rotation value:

  • 0 – Rotate by 90 degrees counter-clockwise and flip vertically. This is the default.
  • 1 – Rotate by 90 degrees clockwise.
  • 2 – Rotate by 90 degrees counter-clockwise.
  • 3 – Rotate by 90 degrees clockwise and flip vertically.

I often record with cameras hanging upside down. Then I want to rotate 180 degrees, which can be done like this:

ffmpeg -i input.mp4 -vf "transpose=2,transpose=2" output.mp4

Even though the video is re-compressed using this method, you can force the audio to be copied without compression by adding the -c:a copy tag:

ffmpeg -i input.mp4 -vf "transpose=1" -c:a copy output.mp4

Of course, these commands can also be included in a chain of other FFmpeg-commands.

Crop video files with FFmpeg

I have previously written about how to trim video files with FFmpeg. It is also easy to crop a video file. Here is a short how-to guide for myself and others.

Cropping is not the same as trimming

This may be basic, but I often see the concepts of cropping and trimming used interchangeably. So, to clarify, trimming a video file means making it shorter by removing frames in the beginning and/or end. That is not the same as cropping a video file, which only selects a particular part of the video for export.

The one-liner

If you want to get it done, here is the one-liner:

ffmpeg -i input.avi ffmpeg -i output2.mp4 -vf crop=1920:1080:0:200 output.avi

There is more information about this command in many places (for example, here). the most critical parts of the above command are:

  • -vf is the command for “video filter”. It can also be spelt out as -filter:v.
  • crop=1920:1080:0:200 means to make a 1920×1080 video starting from the leftmost side of the source image (=0 pixels horizontally) and 200 pixels down from the top.

Of course, this command can be combined with other FFmpeg combinations.