Splitting audio files in the terminal

I have recently played with AudioStellar, a great tool for “sound object”-based exploration and musicking. It reminds me of CataRT, a great tool for concatenative synthesis. I used CataRT quite a lot previously, for example, in the piece Transformation. However, after I switched to Ubuntu and PD instead of OSX and Max, CataRT was no longer an option. So I got very excited when I discovered AudioStellar some weeks ago. It is lightweight and cross-platform and has some novel features that I would like to explore more in the coming weeks.

Samples and sound objects

In today’s post, I will describe how to prepare short audio files to load into AudioStellar. The software is based on loading a collection of “samples”. I always find the term “sample” to be confusing. In digital signal processing terms, a sample is literally one sample, a number describing the signal’s amplitude in that specific moment in time. However, in music production, a “sample” is used to describe a fairly short sound file, often in the range of 0.5 to 5 seconds. This is what in the tradition of the composer-researcher Pierre Schaeffer would be called a sound object. So I prefer to use that term to refer to coherent, short snippets of sound.

AudioStellar relies on loading short sound files. They suggest that for the best experience, one should load files that are shorter than 3 seconds. I have some folders with such short sound files, but I have many more folders with longer recordings that contain multiple sound objects in one file. The beauty of CataRT was that it would analyse such long files and identify all the sound objects within the files. That is not possible in AudioStellar (yet, I hope). So I have to chop up the files myself. This can be done manually, of course, and I am sure some expensive software also does the job. But this was a good excuse to dive into SoX (Sound eXchange).

SoX for sound file processing

SoX is branded as “the Swiss Army knife of audio manipulation”. I have tried it a couple of times, but I usually rely on FFmpeg for basic conversion tasks. FFmpeg is mainly targeted at video applications, but it handles many audio-related tasks well. Converting from .AIFF to .WAV or compressing to .MP3 or .AAC can easily be handled in FFmpeg. There are even some basic audio visualization tools available in FFmpeg.

However, for some more specialized audio jobs, SoX come in handy. I find that the man pages are not very intuitive. There are also relatively few examples of its usage online, at least compared to the numerous FFmpeg examples. Then I was happy to find the nice blog of Mads Kjelgaard, who has written a short set of SoX tutorials. And it was the tutorial on how to remove silence from sound files that caught my attention.

Splitting sound files based on silence

The task is to chop up long sound files containing multiple sound objects. The description of SoX’s silence function is somewhat cryptic. In addition to the above mentioned blog post, I also came across another blog post with some more examples of how the SoX silence function works. And lo and behold, one of the example scripts managed to very nicely chop up one of my long sound files of bird sounds:

sox birds_in.aif birds_out.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

The result is a folder of short sound files, each containing a sound object. Note that I started with an .AIFF file but converted it to .WAV along the way since that is the preferred format of AudioStellar.

SoX managed to quickly split up a long sound file of bird chirps into individual files, each containing one sound object.

To scale this up a bit, I made a small script that will do the same thing on a folder of files:


for i in *.aif;
name=`echo $i | cut -d'.' -f1`;
sox "$i" "${name}.wav" silence 1 0.1 1% 1 0.1 1% : newfile : restart

And this managed to chop up 20 long sound files into approximately 2000 individual sound files.

The batch script split up 20 long sound files into approximately 2000 short sound files in just a few seconds.

There were some very short sound files and some very long. I could have tweaked the script a little to remove these. However, it was quicker to sort the files by file size and delete the smallest and largest files. That left me with around 1500 sound files to load into AudioStellar. More on that exploration later.

Loading 1500 animal sound objects into AudioStellar.

All in all, I was happy to (re)discover SoX and will explore it more in the future. I was happy to see that the above settings worked well for sound recordings with clear silence parts. Some initial testing of more complex sound recordings were not equally successful. So understanding more about how to tweak the settings will be important for future usage.

Some thoughts on microphones for streaming and recording

Many people have asked me about what types of microphones to use for streaming and recording. This is really a jungle, with lots of devices and things to think about. I have written some blog posts about such things previously, such as tips for doing Skype job interviews, testing simple camera/mic solutions, running a Hybrid Disputation, and how to work with plug-in-power microphones.

Earlier today I held a short presentation about microphones at RITMO. This was during our informal Food & Paper lunch seminar, where people eat their lunch while listening to presentations about different topics (usually something academic, but sometimes also other things). Here is a cut-down version of the presentation:

The presentation starts by drawing up the main things to think about: microphones and speakers and the environments that people use these devices within. When we stream or record, we don’t really control other people’s speakers and environments. So the two things we should think about are (1) the microphone we use and (2) the environment we are in.

A very brief summary of microphones, speakers, and room acoustics.

The make a long story short, here are my general advice:

  • Place yourself in a “dry” and quiet space, if possible. A small room with carpets and curtains is much better than a big and empty space.
  • A headset with a boom microphone will usually give the best sound overall, without feedback, and allow you to move your head around. I have many USB headsets from Logitech, Jabra, and Poly, and all of them are fine. The more expensive ones are more comfortable to wear, but the sound quality doesn’t really differ that much. I generally try to avoid Bluetooth headsets since they need to be charged and paired to function. If you can live with a cable, you will get better sound for a lower price.
  • A “podcast-style” condenser microphone will give a more pleasant and radio-like sound. You can also avoid sitting with headphones on all the time, which is very tiresome after some hours. However, condenser microphones are usually relatively large, need a stand, and you may get into feedback problems. There are many options here, but I have been very positively surprised by this cheap Marantz USB microphone.
  • A lavalier microphone is the best choice for making video recordings. They are small, pick up sounds nicely, and some (like the Røde Smartlav+) can be connected directly to a mobile phone or laptop.

There are always better, more expensive, and more complicated solutions out there. However, I am very impressed by some of the newest products that have arrived on the market. The products highlighted above are reasonably priced and will greatly improve the audio of both streaming and recording.

Convert between video containers with FFmpeg

In my ever-growing collection of smart FFmpeg tricks, here is a way of converting from one container format to another. Here I will convert from a QuickTime (.mov) file to a standard MPEG-4 (.mp4), but the recipe should work between other formats too.

If you came here to just see the solution, here you go:

ffmpeg -i infile.mov -acodec copy -vcodec copy outfile.mp4

In the following I will explain everything in a little more detail.

Container formats

One of the confusing things about video files is that they have both a container and a compression format. The container is often what denotes the file suffix. Apple introduced the .mov format for QuickTime files and Microsoft used to use .avi files.

Nowadays, there seems to a converge towards using MPEG containers and .mp4 files. However, both Apple and Microsoft software (and others) still output other formats. This is confusing and can also lead to various playback issues. For example, many web browsers are not able to play these formats natively.

Compression formats

The compression format denotes how the video data is organized on the inside of a container. Also, here there are many different formats. The most common today is to use the H.264 format for video and AAC for audio. These are both parts of the MPEG-4 standard and can be embedded in .mp4 containers. However, both H.264 and AAC can also be embedded in other containers, such as .mov and .avi files.

The important thing to notice is that both .mov and .avi files may contain H.264 video and AAC audio. In those cases, the inside of such files is identical to the content of a .mp4 file. But since the container is different, it may still be unplayable in certain software. That is why I would like to convert from one container format to another. In practice that means converting from .mov or .avi to .mp4 files.

Lossless conversion

There are many ways of converting video files. In most cases, you would end up with a lossy conversion. That means that the video content will be altered. The file size may be smaller, but the quality may also be worse. The general rule is that you want to compress a file as few times as possible.

For all sorts of video conversion/compression jobs, I have ended up turning to FFmpeg. If you haven’t tried it already, FFmpeg is a collection of tools for doing all sorts of audio/video manipulations in the terminal. Working in the terminal may be intimidating at first, but you will never look back once you get the hang of it.

Converting a file from .mov to .mp4 is as simple as typing this little command in a terminal:

ffmpeg -i infile.mov outfile.mp4

This will change from a .mov container to a .mp4 container, which is what we want. But it will also (probably) re-compress the video. That is why it is always smart to look at the content of your original file before converting it. You can do this by typing:

ffmpeg -i infile.mov

For my example file, this returns the following metadata:

    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2016-08-10T10:47:30.000000Z
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: MacBookPro11,1
    com.apple.quicktime.software: Mac OS X 10.11.6 (15G31)
    com.apple.quicktime.creationdate: 2016-08-10T12:45:43+0200
  Duration: 00:00:12.76, start: 0.000000, bitrate: 5780 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k tbn, 50 tbc (default)
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Video
      encoder         : H.264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Audio

There is quite a lot of information there, so we need to look for the important stuff. The first line we want to look for is the one with information about the video content:

Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k     

Here we can see that this .mov file contains a video that is already compressed with H.264. Another thing we can see here is that it is using a weird pixel format (1844×1160). The bit rate of the file is 5243 kb/s, which tells something about how large the file will be in the end. And it is also interesting to see that it is using a framerate of 58.66 fps, which is also a bit odd.

Similarly, we can look at the content of the audio stream of the file:

Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)

Here we can see that the audio is already compressed with AAC at a standard sampling rate of 44.1 kHz and at a more nonstandard bit rate of 269 kb/s.

The main point of investigating the file before we do the conversion is to avoid re-compressing the content of the file. After all, the content is already in the right formats (H.264 and AAC) even though it is in an unwanted container (.mov).

Today’s little trick is how to convert from one format to another without modifying the content of the file, only the container. That can be achieved with the code shown on top:

ffmpeg -i original.mov -acodec copy -vcodec copy outfile.mp4

There are several benefits of doing it this way:

  1. Quality. Avoiding an unnecessary re-compression of the content, which would only degrade the content.
  2. Preserve the pixel size, sampling rates, etc. of the originals. Most video software will use standard settings for these. I often work with various types of non-standard video files, so it is nice to preserve this information.
  3. Save time. Since no re-compression is needed, we only copy content from one container to another. This is much, much faster than re-compressing the content.

All in all, this long explanation of a short command may help to improve your workflows and save some time.

Convert MPEG-2 files to MPEG-4

Image result for Canon XF-105
Canon XF105

This is a note to self, and could potentially also be useful to others in need of converting “old-school” MPEG-2 files into more modern MPEG-4 files using FFmpeg.

In the fourMs lab we have a bunch of Canon XF105 video cameras that record .MXF files with MPEG-2 compression. This is not a very useful format for other things we are doing, so I often have to recompress them to something else.

Inspecting one of the files, I just also discovered that they record the audio onto two mono channels:

Stream #0:0: Video: mpeg2video (4:2:2), yuv422p(tv, bt709, top first), 1920x1080 [SAR 1:1 DAR 16:9], 50000 kb/s, 25 fps, 25 tbr, 25 tbn, 50 tbc

Stream #0:1: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

Stream #0:2: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

So I also want to merge these two mono tracks (which are the left and right inputs of the camera) to a stereo track. FFmpeg comes in handy (as always), and I figured out that this little one-liner will do the trick:

ffmpeg -i input.mxf -vf yadif -vcodec libx264 -q:v 3 -filter_complex "[0:a:0][0:a:1]amerge,channelmap=channel_layout=stereo[st]" -map 0:v -map "[st]" output.mp4

An explanation of some of these settings:

  • yadif: this is for deinterlacing the video
  • libx264: this is probably unnecessary, but forces to use the better MPEG-4 compressor
  • q:v 3: I find this to be a good setting for the video compressor
  • filter_complex: this complex string (courtesy of reddit) does the merging of the two mono sources

Will probably try to add it to MGT-terminal at some point, but this blog post will suffice for now.

Simple tips for better video conferencing

Image result for video meeting

Very many people are currently moving to video-based meetings. For that reason I have written up some quick advise on how to improve your setup. This is based on my interview advise, but grouped differently.


Image result for network clipart

The first important thing is to have as good a network as you can. Video conferencing requires a lot of bandwidth, so even though your e-mail and regular browsing works fine, it may still not be sufficient for good video transmission.

  • Cabled network: If you are able to connect with an Ethernet cable to your router, that would usually always be the best and most solid solution.
  • Wireless network: If cable won’t work for you (it is also difficult logistically in my own apartment), try to get as close as possible to your wi-fi router.


Image result for headset clipart

I would argue that improving the audio is more important than the video for video conferencing. Most video conferencing systems (Skype, Zoom, etc.) will prioritize the audio channel, which means that the video may stutter while the audio is passing through fine.

The main trick is to aim for separating the “foreground” as much as possible from the “background”. There are some very basic audio principles to follow:

  • Use a headset: The best way to get decent sound for video conferencing, is to move the microphone as close as possible to your mouth. Headsets with a microphone boom in front of your face are the best, but a regular mobile phone headset (the one that came with your mobile phone, for example) would still be better than nothing.
  • Use headphones: If you for some reason do not have a headset with built-in microphone, using a regular pair of headphones is still better than using the speakers on your computer. With this setup you use the microphone on the computer, which may not be ideal, but at least you won’t get feedback problems.
  • Avoid reverberant rooms: If you aim for clarity in conversation, it is typically better to sit in a smaller and more damped room than a large one. That means that a bedroom is typically better than a larger living room. If you use a headset this is less important, but particularly if you only use the built-in microphone and speakers on a laptop, this could make a huge difference in how your voice gets through.
  • Mute yourself: In most system there is a button to mute yourself. If you are not talking all the time, it helps to mute yourself from the discussion. Just remember to unmute when you want to say something!


Image result for webcam clipart

The same principle of separating “foreground” from “background” applies to the video.

  • Lighting: To obtain the best possible video image, think about your placement with respect to lighting. It is, for example, not ideal to sit in front of a window, since a bright light in the background will make it difficult to see your face.
  • Background: The best is to sit in front of a plain wall. If that is not possible, consider whether the background of your image is what you want to show to your fellow students/colleagues.
  • Video angle: If you are using the built-in camera on your computer you may not have too many options for how to place the camera. But you may still consider shifting the camera position so that you and your surroundings look as good as possible.

Summing up

There are, of course, many ways to improve your video conferencing setup. Many people believe that you need to invest in expensive equipment to get good results. But even cheap consumer products are very capable of producing decent results these days. So it is more a matter of optimizing what you have. Good luck!