Converting a .WAV file to .AVI

Sometimes, there is a need to convert an audio file into a blank video file with an audio track. This can be useful if you are on a system that does not have a dedicated audio player but a video player (yes, rare, but I work with odd technologies…). Here is a quick recipe

FFmpeg to the rescue

When it comes to converting from one media format to another, I always turn to FFmpeg. It requires “coding” in the terminal, but usually, it is only necessary to write a oneliner. When it comes to converting an audio file (say in .WAV format) to a blank video file (for example, a .AVI file), this is how I would do it:

ffmpeg -i infile.wav -c copy outfile.avi

The “-c copy” part of this command is to preserve the original audio content. The new black video file will have a copy of the original .WAV file content. If you are okay with compressing the audio, you can instead run this command:

ffmpeg -i infile.wav outfile.avi

Then FFmpeg will (by default) compress the audio using the mp3 algorithm. This may or may not be what you are after, but it will at least create a substantially smaller output file.

Of course, you can easily vary the above conversion. For example, if you want to go from .AIFF to .MP4, you would just do:

ffmpeg -i infile.aiff outfile.mp4

Happy converting!

Normalize audio in video files

We are organizing the Rhythm Production and Perception Workshop at RITMO next week. As mentioned in another blog post, we have asked presenters to send us pre-recorded videos. They are all available on the workshop page.

During the workshop, we will play sets of videos in sequence. When doing a test run today, we discovered that the sound levels differed wildly between files. There is clearly the need for normalizing the sound levels to create a good listener experience.

Batch normalization

How does one normalize around 100 video files without too much pain and effort? As always, I turn to my go-to video companion, FFmpeg. Here is a small script I made to do the job:

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
   name=`echo $i | cut -d'.' -f1`; 
   ffmpeg -i "$i" -c:v copy -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

This was the result of some searching around for a smart solution (in Qwant, btw, my new preferred search engine). For example, I use the “nullglob” trick to list multiple file types in the for loop.

The most important part of the script is the normalization, which I found in this blog post. The settings are described as:

  • loudnorm: the name of the normalization filter
  • I: the integrated loudness (from -70 to -5.0)
  • LRA: the loudness range (from 1.0 to 20.0)
  • TP: Indicates the max true peak (from -9.0 to 0.0)

The settings in the script normalize to a high but not maximum signal, which leaves some headroom.

To compress or not

To save processing time and avoid recompressing the video, I have included “-c:v copy” in the script above. Then FFmpeg copies over the video content directly. This is fine for videos with “normal” H.264 compression, which is the case for most .MP4 files. However, when getting 100 files made on all sorts of platforms, there are surely some oddities. There were a couple of cases with weird compression formats, that for some reason failed with the above script. One also had interlacing issues. For them, I modified the script to recompress the files.

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
    name=`echo $i | cut -d'.' -f1`; 
    ffmpeg -i "$i" -vf yadif -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

In this script, the copy part is removed. I have also added “-vf yadif”, which is a de-interlacing video filter.

Summing up

With the first script, I managed to normalize all 100 files in only a few minutes. Some of the files turned up with 0 bytes due to issues with copying the video data. So I ran through these with the second script. That took longer, of course, due to the need for compressing the video.

All in all, the processing took around half an hour. I cannot even imagine how long it would have taken to do this manually in a video editor. I haven’t really thought about the need for normalizing the audio in videos like this before. Next time I will do it right away!

Combining audio and video files with FFmpeg

When working with various types of video analysis, I often end up with video files without audio. So I need to add the audio track by copying either from the source video file or from a separate audio file. There are many ways of doing this. Many people would probably reach for a video editor, but the problem is that you would most likely end up recompressing both the audio and video file. A better solution is to use FFmpeg, the swizz-army knife of video processing.

As long as you know that the audio and video files you want to combine are the same duration, this is an easy task. Say that you have two video files:

  • input1.mp4 = original video with audio
  • input2.avi = analysis video without audio

Then you can use this one-liner to copy the audio from one file to the other:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest output.avi

The output.avi file will have the same video content as input2.avi, but with audio from input1.mp4. Note that this is a lossless (and fast) procedure, it will just copy the content from the source files.

If you want to convert (and compress) the file in one operation, you can use this one-liner to export an MP4 file with .h264 video and aac audio compression:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest -c:v mpeg4 -c:a aac output.mp4

Since this involves compressing the file, it will take (much longer) than the first method.

Splitting audio files in the terminal

I have recently played with AudioStellar, a great tool for “sound object”-based exploration and musicking. It reminds me of CataRT, a great tool for concatenative synthesis. I used CataRT quite a lot previously, for example, in the piece Transformation. However, after I switched to Ubuntu and PD instead of OSX and Max, CataRT was no longer an option. So I got very excited when I discovered AudioStellar some weeks ago. It is lightweight and cross-platform and has some novel features that I would like to explore more in the coming weeks.

Samples and sound objects

In today’s post, I will describe how to prepare short audio files to load into AudioStellar. The software is based on loading a collection of “samples”. I always find the term “sample” to be confusing. In digital signal processing terms, a sample is literally one sample, a number describing the signal’s amplitude in that specific moment in time. However, in music production, a “sample” is used to describe a fairly short sound file, often in the range of 0.5 to 5 seconds. This is what in the tradition of the composer-researcher Pierre Schaeffer would be called a sound object. So I prefer to use that term to refer to coherent, short snippets of sound.

AudioStellar relies on loading short sound files. They suggest that for the best experience, one should load files that are shorter than 3 seconds. I have some folders with such short sound files, but I have many more folders with longer recordings that contain multiple sound objects in one file. The beauty of CataRT was that it would analyse such long files and identify all the sound objects within the files. That is not possible in AudioStellar (yet, I hope). So I have to chop up the files myself. This can be done manually, of course, and I am sure some expensive software also does the job. But this was a good excuse to dive into SoX (Sound eXchange).

SoX for sound file processing

SoX is branded as “the Swiss Army knife of audio manipulation”. I have tried it a couple of times, but I usually rely on FFmpeg for basic conversion tasks. FFmpeg is mainly targeted at video applications, but it handles many audio-related tasks well. Converting from .AIFF to .WAV or compressing to .MP3 or .AAC can easily be handled in FFmpeg. There are even some basic audio visualization tools available in FFmpeg.

However, for some more specialized audio jobs, SoX come in handy. I find that the man pages are not very intuitive. There are also relatively few examples of its usage online, at least compared to the numerous FFmpeg examples. Then I was happy to find the nice blog of Mads Kjelgaard, who has written a short set of SoX tutorials. And it was the tutorial on how to remove silence from sound files that caught my attention.

Splitting sound files based on silence

The task is to chop up long sound files containing multiple sound objects. The description of SoX’s silence function is somewhat cryptic. In addition to the above mentioned blog post, I also came across another blog post with some more examples of how the SoX silence function works. And lo and behold, one of the example scripts managed to very nicely chop up one of my long sound files of bird sounds:

sox birds_in.aif birds_out.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

The result is a folder of short sound files, each containing a sound object. Note that I started with an .AIFF file but converted it to .WAV along the way since that is the preferred format of AudioStellar.

SoX managed to quickly split up a long sound file of bird chirps into individual files, each containing one sound object.

To scale this up a bit, I made a small script that will do the same thing on a folder of files:

#!/bin/bash

for i in *.aif;
do
name=`echo $i | cut -d'.' -f1`;
sox "$i" "${name}.wav" silence 1 0.1 1% 1 0.1 1% : newfile : restart
done

And this managed to chop up 20 long sound files into approximately 2000 individual sound files.

The batch script split up 20 long sound files into approximately 2000 short sound files in just a few seconds.

There were some very short sound files and some very long. I could have tweaked the script a little to remove these. However, it was quicker to sort the files by file size and delete the smallest and largest files. That left me with around 1500 sound files to load into AudioStellar. More on that exploration later.

Loading 1500 animal sound objects into AudioStellar.

All in all, I was happy to (re)discover SoX and will explore it more in the future. I was happy to see that the above settings worked well for sound recordings with clear silence parts. Some initial testing of more complex sound recordings were not equally successful. So understanding more about how to tweak the settings will be important for future usage.

Some thoughts on microphones for streaming and recording

Many people have asked me about what types of microphones to use for streaming and recording. This is really a jungle, with lots of devices and things to think about. I have written some blog posts about such things previously, such as tips for doing Skype job interviews, testing simple camera/mic solutions, running a Hybrid Disputation, and how to work with plug-in-power microphones.

Earlier today I held a short presentation about microphones at RITMO. This was during our informal Food & Paper lunch seminar, where people eat their lunch while listening to presentations about different topics (usually something academic, but sometimes also other things). Here is a cut-down version of the presentation:

The presentation starts by drawing up the main things to think about: microphones and speakers and the environments that people use these devices within. When we stream or record, we don’t really control other people’s speakers and environments. So the two things we should think about are (1) the microphone we use and (2) the environment we are in.

A very brief summary of microphones, speakers, and room acoustics.

The make a long story short, here are my general advice:

  • Place yourself in a “dry” and quiet space, if possible. A small room with carpets and curtains is much better than a big and empty space.
  • A headset with a boom microphone will usually give the best sound overall, without feedback, and allow you to move your head around. I have many USB headsets from Logitech, Jabra, and Poly, and all of them are fine. The more expensive ones are more comfortable to wear, but the sound quality doesn’t really differ that much. I generally try to avoid Bluetooth headsets since they need to be charged and paired to function. If you can live with a cable, you will get better sound for a lower price.
  • A “podcast-style” condenser microphone will give a more pleasant and radio-like sound. You can also avoid sitting with headphones on all the time, which is very tiresome after some hours. However, condenser microphones are usually relatively large, need a stand, and you may get into feedback problems. There are many options here, but I have been very positively surprised by this cheap Marantz USB microphone.
  • A lavalier microphone is the best choice for making video recordings. They are small, pick up sounds nicely, and some (like the Røde Smartlav+) can be connected directly to a mobile phone or laptop.

There are always better, more expensive, and more complicated solutions out there. However, I am very impressed by some of the newest products that have arrived on the market. The products highlighted above are reasonably priced and will greatly improve the audio of both streaming and recording.