Merge multiple MP4 files

I have been doing several long recordings with GoPro cameras recently. The cameras automatically split the recordings into 4GB files, which leaves me with a myriad of files to work with. I have therefore made a script to help with the pre-processing of the files.

This is somewhat similar to the script I made to convert MXF files to MP4, but with better handling of the temp file for storing information about the files to merge:

Save the script above as, put it in the folder of your files, make it executable, with a command like:

chmod u+x

run the file:


and watch the magic.

The script above can be remixed in various ways. For example, if you want a smaller output file (the original GoPro files are quite large), you can use FFmpeg’s default MP4 compression settings by removing the “-c copy” part in the last line above. That will also make the script take much longer, since it will recompress the output file.

Preparing video for Matlab analysis

Typical video files, such as MP4 files with H.264 compression, are usually small in size and with high visual quality. Such files are suitable for visual inspection but do not work well for video analysis. In most cases, computer vision software prefers to work with raw data or other compression formats.

The Musical Gestures Toolbox for Matlab works best with these file types:

  • Video: use MJPEG (Motion JPEG) as the compression format. This compresses each frame individually. Use .AVI as the container, since this is the one that works best on all platforms.
  • Audio: use uncompressed audio (16-bit PCM), saved as .WAV files (.AIFF usually also works fine). If you need to use compression, MP3 compression (MPEG-1, Layer 3) is still more versatile than AAC (used in .MP4 files). If you use a bitrate of 192 Kbs or higher, you should not get too many artefacts.

Many people ask me how to convert from typical MP4 files (with H.264 video compression and AAC audio compression). The easiest solution (I think) is to use FFMPEG, the versatile command-line utility. Here is a oneliner that will convert from an .MP4 file into a .AVI file with MJPEG and PCM audio:

FFmpeg -i input.mp4 -c:a pcm_s16le -c:v mjpeg -q:v 3 -huffman optimal output.avi

The resultant file should work well in Matlab and other video analysis tools. We have included this conversion by default in the new Musical Gestures Toolbox for Python. So there, you can directly load an MP4 file, which will be converted to an AVI file using a script similar to the one above.

Normalize audio in video files

We are organizing the Rhythm Production and Perception Workshop at RITMO next week. As mentioned in another blog post, we have asked presenters to send us pre-recorded videos. They are all available on the workshop page.

During the workshop, we will play sets of videos in sequence. When doing a test run today, we discovered that the sound levels differed wildly between files. There is clearly the need for normalizing the sound levels to create a good listener experience.

Batch normalization

How does one normalize around 100 video files without too much pain and effort? As always, I turn to my go-to video companion, FFmpeg. Here is a small script I made to do the job:


shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
   name=`echo $i | cut -d'.' -f1`; 
   ffmpeg -i "$i" -c:v copy -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 

This was the result of some searching around for a smart solution (in Qwant, btw, my new preferred search engine). For example, I use the “nullglob” trick to list multiple file types in the for loop.

The most important part of the script is the normalization, which I found in this blog post. The settings are described as:

  • loudnorm: the name of the normalization filter
  • I: the integrated loudness (from -70 to -5.0)
  • LRA: the loudness range (from 1.0 to 20.0)
  • TP: Indicates the max true peak (from -9.0 to 0.0)

The settings in the script normalize to a high but not maximum signal, which leaves some headroom.

To compress or not

To save processing time and avoid recompressing the video, I have included “-c:v copy” in the script above. Then FFmpeg copies over the video content directly. This is fine for videos with “normal” H.264 compression, which is the case for most .MP4 files. However, when getting 100 files made on all sorts of platforms, there are surely some oddities. There were a couple of cases with weird compression formats, that for some reason failed with the above script. One also had interlacing issues. For them, I modified the script to recompress the files.


shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
    name=`echo $i | cut -d'.' -f1`; 
    ffmpeg -i "$i" -vf yadif -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 

In this script, the copy part is removed. I have also added “-vf yadif”, which is a de-interlacing video filter.

Summing up

With the first script, I managed to normalize all 100 files in only a few minutes. Some of the files turned up with 0 bytes due to issues with copying the video data. So I ran through these with the second script. That took longer, of course, due to the need for compressing the video.

All in all, the processing took around half an hour. I cannot even imagine how long it would have taken to do this manually in a video editor. I haven’t really thought about the need for normalizing the audio in videos like this before. Next time I will do it right away!

Making 100 video poster images programmatically

We are organizing the Rhythm Production and Perception Workshop 2021 at RITMO a week from now. Like many other conferences these days, this one will also be run online. Presentations have been pre-recorded (10 minutes each) and we also have short poster blitz videos (1 minute each).

Pre-recorded videos

People have sent us their videos in advance, but they all have different first “slides”. So, to create some consistency among the videos, we decided to make an introduction slide for each of them. This would then also serve as the “thumbnail” of the video when presented in a grid format.

One solution could be to add some frames at the beginning of each video file. This could probably be done with FFmpeg without recompressing the files. However, given that we are talking about approximately 100 video files, I am sure there would have been some hiccups.

A quicker and better option is to add “poster images” when uploading the files to YouTube. We also support this on UiO’s web pages, which serves as the long-term archive of the material. The question, then, is how to create these 100 poster images without too much work. Here is how I did it on my Ubuntu machine.

Mail Merge in LibreOffice Writer

My initial thought was to start with Impress, the free presentation software in LibreOffice. I quickly searched to see if there is any tool to create slides programmatically but didn’t find anything that seemed to be straightforward.

Instead, I remembered the good old “mail merge” functionality of Writer. This was made for creating envelope labels back in the days when people still sent physical mail. However, it can be tweaked for other things. After all, I have the material I wanted to include in the poster image in a simple spreadsheet, so it was quick and easy to import the spreadsheet in Writer and select the two columns I wanted to include (“author name” and “title”).

A spreadsheet with the source information about authors and paper titles.

I wanted the final image to be in Full-HD format (1920 x 1080 pixels), which is not a standard format in Writer. However, there is the option of choosing a custom page size, so I set up a page size of 192 x 108 mm in Writer. Then I added some fixed elements on the page, including a RITMO emblem and the conference title.

Setting up the template in LibreOffice Writer.

Finally, I saved a file with the merged content and exported as a PDF.

From PDF to PNG

The output of Writer was a multi-page PDF. However, what we need is a single image file per video. So I turned to the terminal and used this oneliner based on pdfseparate to split up the PDF into multiple one-page PDF files:

pdfseparate rppw2021-papers-merged.pdf posters%d.pdf

The trick here is to use the %d command to get a sequential number for each PDF.

Next, I wanted to convert these individual PDF files to PNG files. Here I turned to the convert function of ImageMagick, and wrote a short one-liner that does the trick:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; convert -density 300 -resize 1920x1080 -background white -flatten "$i" "$name.png"; done

It looks for all the PDFs in a directory and converts them to a PNG file with a Full-HD resolution. I found that it was necessary to include the “-density 300” to get a nice-looking image. For some reason, the default seems to be a fairly low-quality resolution. To avoid any transparency issues in later stages, I also included the “-background white” and “-flatten” functions.

The end result was a folder of PNG files.

Putting it all together

The last step is to match the video files with the right PNG image in the video playback solution. Here it is shown using the video player we have at UiO:

Once I figured out the workflow, the whole process was very rapid. Hopefully, this post can save someone many hours of manual work!

Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.