Normalize audio in video files

We are organizing the Rhythm Production and Perception Workshop at RITMO next week. As mentioned in another blog post, we have asked presenters to send us pre-recorded videos. They are all available on the workshop page.

During the workshop, we will play sets of videos in sequence. When doing a test run today, we discovered that the sound levels differed wildly between files. There is clearly the need for normalizing the sound levels to create a good listener experience.

Batch normalization

How does one normalize around 100 video files without too much pain and effort? As always, I turn to my go-to video companion, FFmpeg. Here is a small script I made to do the job:

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
   name=`echo $i | cut -d'.' -f1`; 
   ffmpeg -i "$i" -c:v copy -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

This was the result of some searching around for a smart solution (in Qwant, btw, my new preferred search engine). For example, I use the “nullglob” trick to list multiple file types in the for loop.

The most important part of the script is the normalization, which I found in this blog post. The settings are described as:

  • loudnorm: the name of the normalization filter
  • I: the integrated loudness (from -70 to -5.0)
  • LRA: the loudness range (from 1.0 to 20.0)
  • TP: Indicates the max true peak (from -9.0 to 0.0)

The settings in the script normalize to a high but not maximum signal, which leaves some headroom.

To compress or not

To save processing time and avoid recompressing the video, I have included “-c:v copy” in the script above. Then FFmpeg copies over the video content directly. This is fine for videos with “normal” H.264 compression, which is the case for most .MP4 files. However, when getting 100 files made on all sorts of platforms, there are surely some oddities. There were a couple of cases with weird compression formats, that for some reason failed with the above script. One also had interlacing issues. For them, I modified the script to recompress the files.

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
    name=`echo $i | cut -d'.' -f1`; 
    ffmpeg -i "$i" -vf yadif -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

In this script, the copy part is removed. I have also added “-vf yadif”, which is a de-interlacing video filter.

Summing up

With the first script, I managed to normalize all 100 files in only a few minutes. Some of the files turned up with 0 bytes due to issues with copying the video data. So I ran through these with the second script. That took longer, of course, due to the need for compressing the video.

All in all, the processing took around half an hour. I cannot even imagine how long it would have taken to do this manually in a video editor. I haven’t really thought about the need for normalizing the audio in videos like this before. Next time I will do it right away!

Making 100 video poster images programmatically

We are organizing the Rhythm Production and Perception Workshop 2021 at RITMO a week from now. Like many other conferences these days, this one will also be run online. Presentations have been pre-recorded (10 minutes each) and we also have short poster blitz videos (1 minute each).

Pre-recorded videos

People have sent us their videos in advance, but they all have different first “slides”. So, to create some consistency among the videos, we decided to make an introduction slide for each of them. This would then also serve as the “thumbnail” of the video when presented in a grid format.

One solution could be to add some frames at the beginning of each video file. This could probably be done with FFmpeg without recompressing the files. However, given that we are talking about approximately 100 video files, I am sure there would have been some hiccups.

A quicker and better option is to add “poster images” when uploading the files to YouTube. We also support this on UiO’s web pages, which serves as the long-term archive of the material. The question, then, is how to create these 100 poster images without too much work. Here is how I did it on my Ubuntu machine.

Mail Merge in LibreOffice Writer

My initial thought was to start with Impress, the free presentation software in LibreOffice. I quickly searched to see if there is any tool to create slides programmatically but didn’t find anything that seemed to be straightforward.

Instead, I remembered the good old “mail merge” functionality of Writer. This was made for creating envelope labels back in the days when people still sent physical mail. However, it can be tweaked for other things. After all, I have the material I wanted to include in the poster image in a simple spreadsheet, so it was quick and easy to import the spreadsheet in Writer and select the two columns I wanted to include (“author name” and “title”).

A spreadsheet with the source information about authors and paper titles.

I wanted the final image to be in Full-HD format (1920 x 1080 pixels), which is not a standard format in Writer. However, there is the option of choosing a custom page size, so I set up a page size of 192 x 108 mm in Writer. Then I added some fixed elements on the page, including a RITMO emblem and the conference title.

Setting up the template in LibreOffice Writer.

Finally, I saved a file with the merged content and exported as a PDF.

From PDF to PNG

The output of Writer was a multi-page PDF. However, what we need is a single image file per video. So I turned to the terminal and used this oneliner based on pdfseparate to split up the PDF into multiple one-page PDF files:

pdfseparate rppw2021-papers-merged.pdf posters%d.pdf

The trick here is to use the %d command to get a sequential number for each PDF.

Next, I wanted to convert these individual PDF files to PNG files. Here I turned to the convert function of ImageMagick, and wrote a short one-liner that does the trick:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; convert -density 300 -resize 1920x1080 -background white -flatten "$i" "$name.png"; done

It looks for all the PDFs in a directory and converts them to a PNG file with a Full-HD resolution. I found that it was necessary to include the “-density 300” to get a nice-looking image. For some reason, the default seems to be a fairly low-quality resolution. To avoid any transparency issues in later stages, I also included the “-background white” and “-flatten” functions.

The end result was a folder of PNG files.

Putting it all together

The last step is to match the video files with the right PNG image in the video playback solution. Here it is shown using the video player we have at UiO:

Once I figured out the workflow, the whole process was very rapid. Hopefully, this post can save someone many hours of manual work!

Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.

Creating multi-exposure keyframe image displays with FFmpeg and ImageMagick

While I was testing visualization of some videos from the AIST database earlier today, I wanted to also create some “keyframe image displays”. This can be seen as a way of doing multi-exposure photography, and should be quite straightforward to do. Still it took me quite some time to figure out exactly how to implement it. It may be that I was searching for the wrong things, but in case anyone else is looking for the same, here is a quick write up.

The current procedure is done using a combination of two very handy command line tools: FFmpeg and ImageMagick. I would like to add it to both the Matlab and Python versions of the Musical Gestures Toolbox as well, but will need to figure that out another time.

In this example I will use a hip-hop dance video from the AIST database:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Automagically exported keyframe images.

In my search for a solution, I tried a lot of complex things. But it turned out to be super-simple to get what I wanted:

convert *.tiff -background white -compose darken -flatten keyframes.jpg

Here we use the convert function of ImageMagick to add all the exported keyframes together to one combined image:

Keyframe image display of hip-hop video.

Since the dancer was moving in more or less the same place all the time, it is quite compact. Running the same functions on another video of a contemporary dancer, on the other hand, shows some of the potential of this visualization method. Here is the video:

Which results in this keyframe display image:

Besides being cool to look at, it is also quite informative when it comes to telling what is going on in the video. You get information about the temporal and spatial movement of the dancer, although it is difficult to understand exactly when she was moving where.

Next is to also include these methods in the Musical Gestures Toolbox.

Converting MXF files to MP4 with FFmpeg

We have a bunch of Canon XF105 at RITMO, a camera that records MXF files. This is not a particularly useful file format (unless for further processing). Since many of our recordings are just for documentation purposes, we often see the need to convert to MP4. Here I present two solutions for converting MXF files to MP4, both as individual files and a combined file from a folder. These are shell scripts based on the handy FFmpeg.

Convert individual MXF files to individual MP4 files

The first solution is based on converting a bunch of MXF files to individual MP4 files. This is practical if there are multiple, single shots.

Save the script above as mxf2mp4.sh, make it executable, with a command like:

chmod u+x mxf2mp4.sh

and run the file:

./mxf2mp4.sh

Convert a folder of MXF files to one MP4 file

The second solution is when we have made one long recording, which is split into individual MXF files of 1.9 GB size (the maximum size of FAT32-formatted drives) in the camera. Then the aim is to merge all of these to one MP4 file. This script will do the trick:

Do the same as above to run the script.