Making 100 video poster images programmatically

We are organizing the Rhythm Production and Perception Workshop 2021 at RITMO a week from now. Like many other conferences these days, this one will also be run online. Presentations have been pre-recorded (10 minutes each) and we also have short poster blitz videos (1 minute each).

Pre-recorded videos

People have sent us their videos in advance, but they all have different first “slides”. So, to create some consistency among the videos, we decided to make an introduction slide for each of them. This would then also serve as the “thumbnail” of the video when presented in a grid format.

One solution could be to add some frames at the beginning of each video file. This could probably be done with FFmpeg without recompressing the files. However, given that we are talking about approximately 100 video files, I am sure there would have been some hiccups.

A quicker and better option is to add “poster images” when uploading the files to YouTube. We also support this on UiO’s web pages, which serves as the long-term archive of the material. The question, then, is how to create these 100 poster images without too much work. Here is how I did it on my Ubuntu machine.

Mail Merge in LibreOffice Writer

My initial thought was to start with Impress, the free presentation software in LibreOffice. I quickly searched to see if there is any tool to create slides programmatically but didn’t find anything that seemed to be straightforward.

Instead, I remembered the good old “mail merge” functionality of Writer. This was made for creating envelope labels back in the days when people still sent physical mail. However, it can be tweaked for other things. After all, I have the material I wanted to include in the poster image in a simple spreadsheet, so it was quick and easy to import the spreadsheet in Writer and select the two columns I wanted to include (“author name” and “title”).

A spreadsheet with the source information about authors and paper titles.

I wanted the final image to be in Full-HD format (1920 x 1080 pixels), which is not a standard format in Writer. However, there is the option of choosing a custom page size, so I set up a page size of 192 x 108 mm in Writer. Then I added some fixed elements on the page, including a RITMO emblem and the conference title.

Setting up the template in LibreOffice Writer.

Finally, I saved a file with the merged content and exported as a PDF.

From PDF to PNG

The output of Writer was a multi-page PDF. However, what we need is a single image file per video. So I turned to the terminal and used this oneliner based on pdfseparate to split up the PDF into multiple one-page PDF files:

pdfseparate rppw2021-papers-merged.pdf posters%d.pdf

The trick here is to use the %d command to get a sequential number for each PDF.

Next, I wanted to convert these individual PDF files to PNG files. Here I turned to the convert function of ImageMagick, and wrote a short one-liner that does the trick:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; convert -density 300 -resize 1920x1080 -background white -flatten "$i" "$name.png"; done

It looks for all the PDFs in a directory and converts them to a PNG file with a Full-HD resolution. I found that it was necessary to include the “-density 300” to get a nice-looking image. For some reason, the default seems to be a fairly low-quality resolution. To avoid any transparency issues in later stages, I also included the “-background white” and “-flatten” functions.

The end result was a folder of PNG files.

Putting it all together

The last step is to match the video files with the right PNG image in the video playback solution. Here it is shown using the video player we have at UiO:

Once I figured out the workflow, the whole process was very rapid. Hopefully, this post can save someone many hours of manual work!

Strings On-Line installation

We presented the installation Strings On-Line at NIME 2020. It was supposed to be a physical installation at the conference to be held in Birmingham, UK.

Due to the corona crisis, the conference went online, and we decided to redesign the proposed physical installation into an online installation instead. The installation ran continuously from 21-25 July last year, and hundreds of people “came by” to interact with it.

I finally got around to edit a short (1-minute) video promo of the installation:

I have also made a short (10-minute) “behind the scenes” mini-documentary about the installation. Here researchers from RITMO, University of Oslo, talk about the setup featuring 6 self-playing guitars, 3 remote-controlled robots, and a 24/7 high-quality, low-latency, audiovisual stream.

We are planning a new installation for the RPPW conference this year. So if you are interested in exploring such an online installation live, please stay tuned.

Splitting audio files in the terminal

I have recently played with AudioStellar, a great tool for “sound object”-based exploration and musicking. It reminds me of CataRT, a great tool for concatenative synthesis. I used CataRT quite a lot previously, for example, in the piece Transformation. However, after I switched to Ubuntu and PD instead of OSX and Max, CataRT was no longer an option. So I got very excited when I discovered AudioStellar some weeks ago. It is lightweight and cross-platform and has some novel features that I would like to explore more in the coming weeks.

Samples and sound objects

In today’s post, I will describe how to prepare short audio files to load into AudioStellar. The software is based on loading a collection of “samples”. I always find the term “sample” to be confusing. In digital signal processing terms, a sample is literally one sample, a number describing the signal’s amplitude in that specific moment in time. However, in music production, a “sample” is used to describe a fairly short sound file, often in the range of 0.5 to 5 seconds. This is what in the tradition of the composer-researcher Pierre Schaeffer would be called a sound object. So I prefer to use that term to refer to coherent, short snippets of sound.

AudioStellar relies on loading short sound files. They suggest that for the best experience, one should load files that are shorter than 3 seconds. I have some folders with such short sound files, but I have many more folders with longer recordings that contain multiple sound objects in one file. The beauty of CataRT was that it would analyse such long files and identify all the sound objects within the files. That is not possible in AudioStellar (yet, I hope). So I have to chop up the files myself. This can be done manually, of course, and I am sure some expensive software also does the job. But this was a good excuse to dive into SoX (Sound eXchange).

SoX for sound file processing

SoX is branded as “the Swiss Army knife of audio manipulation”. I have tried it a couple of times, but I usually rely on FFmpeg for basic conversion tasks. FFmpeg is mainly targeted at video applications, but it handles many audio-related tasks well. Converting from .AIFF to .WAV or compressing to .MP3 or .AAC can easily be handled in FFmpeg. There are even some basic audio visualization tools available in FFmpeg.

However, for some more specialized audio jobs, SoX come in handy. I find that the man pages are not very intuitive. There are also relatively few examples of its usage online, at least compared to the numerous FFmpeg examples. Then I was happy to find the nice blog of Mads Kjelgaard, who has written a short set of SoX tutorials. And it was the tutorial on how to remove silence from sound files that caught my attention.

Splitting sound files based on silence

The task is to chop up long sound files containing multiple sound objects. The description of SoX’s silence function is somewhat cryptic. In addition to the above mentioned blog post, I also came across another blog post with some more examples of how the SoX silence function works. And lo and behold, one of the example scripts managed to very nicely chop up one of my long sound files of bird sounds:

sox birds_in.aif birds_out.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

The result is a folder of short sound files, each containing a sound object. Note that I started with an .AIFF file but converted it to .WAV along the way since that is the preferred format of AudioStellar.

SoX managed to quickly split up a long sound file of bird chirps into individual files, each containing one sound object.

To scale this up a bit, I made a small script that will do the same thing on a folder of files:

#!/bin/bash

for i in *.aif;
do
name=`echo $i | cut -d'.' -f1`;
sox "$i" "${name}.wav" silence 1 0.1 1% 1 0.1 1% : newfile : restart
done

And this managed to chop up 20 long sound files into approximately 2000 individual sound files.

The batch script split up 20 long sound files into approximately 2000 short sound files in just a few seconds.

There were some very short sound files and some very long. I could have tweaked the script a little to remove these. However, it was quicker to sort the files by file size and delete the smallest and largest files. That left me with around 1500 sound files to load into AudioStellar. More on that exploration later.

Loading 1500 animal sound objects into AudioStellar.

All in all, I was happy to (re)discover SoX and will explore it more in the future. I was happy to see that the above settings worked well for sound recordings with clear silence parts. Some initial testing of more complex sound recordings were not equally successful. So understanding more about how to tweak the settings will be important for future usage.

INTIMAL Documentary

I think it is nice to share this new short documentary about the project INTIMAL: Interfaces for Relational Listening – Body, Memory, Migration, Telematics today on 8 March. This project ran from 2017 to 2019 at RITMO, under the direction of sound artist Ximena Alarcon. It was funded by a Marie Sklodowska-Curie grant from the EU, and I was fortunate enough to mentor the project.

The main aim of INTIMAL was to explore the body as an “interface” for keeping and transforming the memory of place in migratory contexts. This was done through the development and exploration of a physical-virtual “embodied system” for relational listening. The short documentary excellently describes the project and explains its outcomes.

Some more information about the project from the INTIMAL web page:

The project invites people to listen to their migrations, in order to expand their sense of place and sense of presence. In its first stage, Ximena proposes two questions: 1) how the body becomes an interface that keeps memory of place, and 2) how to improvise and transmit the experience of an embodied migratory journey using non-screen based interfaces. As a case study, she involved Colombian women who have migrated to European countries, in the exploration of their migratory journeys, via listening, improvised voice and body movement, in co-located and telematic settings. She used Deep Listening practice and Embodied Music Cognition methods to develop the INTIMAL System: a physical-virtual “embodied” system for Relational Listening, to be used in telematic sonic performance, in the context of human migration.

Visualising a Bach prelude played on Boomwhackers

I came across a fantastic performance of a Bach prelude played on Boomwhackers by Les Objets Volants.

It is really incredible how they manage to coordinate the sticks and make it into a beautiful performance. Given my interest in the visual aspects of music performance, I reached for the Musical Gestures Toolbox to create some video visualisations.

I started with creating an average image of the video:

Average image of the video.

This image is not particularly interesting. The performers moved around quite a bit, so the average image mainly shows the stage. An alternative spatial summary is the creation of a keyframe history image of the video file. This is created by extracting the keyframes of the video (approximately 50 frames) and combining these into one image:

Keyframe history image.

The keyframe history image summarizes how the performers moved around on stage and explained the spatial distribution of activity over time. But to get more into the temporal distribution of motion, we need to look at a spatiotemporal visualization. This is where motiongrams are useful:

Motiongram of vertical motion (time from left to right)
Motiongram of vertical motion (time from left to right)
Motiongram of horizontal motion (time from top to bottom)
Motiongram of horizontal motion (time from top to bottom)

If you click on the images above, you can zoom in to look at the visual beauty of the performance.