Splitting audio files in the terminal

I have recently played with AudioStellar, a great tool for “sound object”-based exploration and musicking. It reminds me of CataRT, a great tool for concatenative synthesis. I used CataRT quite a lot previously, for example, in the piece Transformation. However, after I switched to Ubuntu and PD instead of OSX and Max, CataRT was no longer an option. So I got very excited when I discovered AudioStellar some weeks ago. It is lightweight and cross-platform and has some novel features that I would like to explore more in the coming weeks.

Samples and sound objects

In today’s post, I will describe how to prepare short audio files to load into AudioStellar. The software is based on loading a collection of “samples”. I always find the term “sample” to be confusing. In digital signal processing terms, a sample is literally one sample, a number describing the signal’s amplitude in that specific moment in time. However, in music production, a “sample” is used to describe a fairly short sound file, often in the range of 0.5 to 5 seconds. This is what in the tradition of the composer-researcher Pierre Schaeffer would be called a sound object. So I prefer to use that term to refer to coherent, short snippets of sound.

AudioStellar relies on loading short sound files. They suggest that for the best experience, one should load files that are shorter than 3 seconds. I have some folders with such short sound files, but I have many more folders with longer recordings that contain multiple sound objects in one file. The beauty of CataRT was that it would analyse such long files and identify all the sound objects within the files. That is not possible in AudioStellar (yet, I hope). So I have to chop up the files myself. This can be done manually, of course, and I am sure some expensive software also does the job. But this was a good excuse to dive into SoX (Sound eXchange).

SoX for sound file processing

SoX is branded as “the Swiss Army knife of audio manipulation”. I have tried it a couple of times, but I usually rely on FFmpeg for basic conversion tasks. FFmpeg is mainly targeted at video applications, but it handles many audio-related tasks well. Converting from .AIFF to .WAV or compressing to .MP3 or .AAC can easily be handled in FFmpeg. There are even some basic audio visualization tools available in FFmpeg.

However, for some more specialized audio jobs, SoX come in handy. I find that the man pages are not very intuitive. There are also relatively few examples of its usage online, at least compared to the numerous FFmpeg examples. Then I was happy to find the nice blog of Mads Kjelgaard, who has written a short set of SoX tutorials. And it was the tutorial on how to remove silence from sound files that caught my attention.

Splitting sound files based on silence

The task is to chop up long sound files containing multiple sound objects. The description of SoX’s silence function is somewhat cryptic. In addition to the above mentioned blog post, I also came across another blog post with some more examples of how the SoX silence function works. And lo and behold, one of the example scripts managed to very nicely chop up one of my long sound files of bird sounds:

sox birds_in.aif birds_out.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

The result is a folder of short sound files, each containing a sound object. Note that I started with an .AIFF file but converted it to .WAV along the way since that is the preferred format of AudioStellar.

SoX managed to quickly split up a long sound file of bird chirps into individual files, each containing one sound object.

To scale this up a bit, I made a small script that will do the same thing on a folder of files:

#!/bin/bash

for i in *.aif;
do
name=`echo $i | cut -d'.' -f1`;
sox "$i" "${name}.wav" silence 1 0.1 1% 1 0.1 1% : newfile : restart
done

And this managed to chop up 20 long sound files into approximately 2000 individual sound files.

The batch script split up 20 long sound files into approximately 2000 short sound files in just a few seconds.

There were some very short sound files and some very long. I could have tweaked the script a little to remove these. However, it was quicker to sort the files by file size and delete the smallest and largest files. That left me with around 1500 sound files to load into AudioStellar. More on that exploration later.

Loading 1500 animal sound objects into AudioStellar.

All in all, I was happy to (re)discover SoX and will explore it more in the future. I was happy to see that the above settings worked well for sound recordings with clear silence parts. Some initial testing of more complex sound recordings were not equally successful. So understanding more about how to tweak the settings will be important for future usage.

“Flatten” file names in the terminal

I am often dealing with folders with lots of files with weird file names. Spaces, capital letters, and so on, often cause problems. Instead of manually fixing such file names, here is a quick one-liner (found here) that can be run in the terminal (at least on Ubuntu) to solve the problem:

rename 'tr/ A-Z/-a-z/' -- *

It is based on a simple regular expression, replacing any spaces with hyphens, and changing any capital letters to lower case.

Create timelapse video from images with FFmpeg

I take a lot of timelapse shots with a GoPro camera. Usually, I do this with the camera’s photo setting instead of the video setting. That is because I find it easier to delete unwanted pictures from the series that way. It also simplifies selecting individual photos when I want that. But then I need a way to create a timelapse video from the photos easily.

Here is an FFmpeg one-liner that does the job:

ffmpeg -r 10 -pattern_type glob -i "*.JPG" -s 1920x1440 -vcodec libx264 output.mp4

To break down the different parameters a little:

  • -r 10″: the framerate (fps)
  • “-pattern_type glob”: to allow for selecting all JPGs using “*.JPG”
  • “-s 1920×1440”: downscales the images to a pseudo-like HD format
  • “-vcodec libx264”: force to use this codec

Convert MPEG-2 files to MPEG-4

Image result for Canon XF-105
Canon XF105

This is a note to self, and could potentially also be useful to others in need of converting “old-school” MPEG-2 files into more modern MPEG-4 files using FFmpeg.

In the fourMs lab we have a bunch of Canon XF105 video cameras that record .MXF files with MPEG-2 compression. This is not a very useful format for other things we are doing, so I often have to recompress them to something else.

Inspecting one of the files, I just also discovered that they record the audio onto two mono channels:

Stream #0:0: Video: mpeg2video (4:2:2), yuv422p(tv, bt709, top first), 1920x1080 [SAR 1:1 DAR 16:9], 50000 kb/s, 25 fps, 25 tbr, 25 tbn, 50 tbc

Stream #0:1: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

Stream #0:2: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

So I also want to merge these two mono tracks (which are the left and right inputs of the camera) to a stereo track. FFmpeg comes in handy (as always), and I figured out that this little one-liner will do the trick:

ffmpeg -i input.mxf -vf yadif -vcodec libx264 -q:v 3 -filter_complex "[0:a:0][0:a:1]amerge,channelmap=channel_layout=stereo[st]" -map 0:v -map "[st]" output.mp4

An explanation of some of these settings:

  • yadif: this is for deinterlacing the video
  • libx264: this is probably unnecessary, but forces to use the better MPEG-4 compressor
  • q:v 3: I find this to be a good setting for the video compressor
  • filter_complex: this complex string (courtesy of reddit) does the merging of the two mono sources

Will probably try to add it to MGT-terminal at some point, but this blog post will suffice for now.

Creating individual image files from presentation slides

How do you create full-screen images from each of the slides of a Google Docs presentation without too much manual work? For the previous blog post on my Munin keynote, I wanted to include some pictures from my 90-slide presentation. There is probably a point and click solution to this problem, but it is even more fun to use some command line tools to help out. These commands have been tested on Ubuntu 19.10, but should probably work on many other systems as well, as long as you have installed pdfseparate and convert.

After exporting a PDF from the Google Presentation, I made a separate PDF file of each slide using this command:

pdfseparate input.pdf output%d.pdf

This creates a bunch of PDF files with a running number. Then I ran this little for loop:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; convert -density 200 "$i" "$name.png"; done

And voila, then I had nice PNG files of all my slides. I found that the trick is to use the “-density 200” setting (choose the density that suit your needs), since the default resolution and quality is too low.