Rigorous Empirical Evaluation of Sound and Music Computing Research

At the NordicSMC conference last week, I was part of a panel discussing the topic Rigorous Empirical Evaluation of SMC Research. This was the original description of the session:

The goal of this session is to share, discuss, and appraise the topic of evaluation in the context of SMC research and development. Evaluation is a cornerstone of every scientific research domain, but is a complex subject in our context due to the interdisciplinary nature of SMC coupled with the subjectivity involved in assessing creative endeavours. As SMC research proliferates across the world, the relevance of robust, rigorous empirical evaluation is ever-increasing in the academic and industrial realms. The session will begin with presentations from representatives of NordicSMC member universities, followed by a more free-flowing discussion among these panel members, followed by audience involvement.

The discussion was moderated by Sofia Dahl (Aalborg University) and consisted of Nashmin Yeganeh (University of Iceland), Razvan Paisa (Aalborg University), and Roberto Bresin (KTH).

The challenge of interdisciplinarity

Everyone in the panel agreed that rigorous evaluation is important. The challenge is to figure out what type(s) of evaluation is useful and plausible within sound and music computing research. This was efficiently illustrated in a list of the different methods that are employed by the researchers at KTH.

A list of methods in use by the sound and music computing researchers at KTH.

Roberto Bresin had divided the KTH list into methods that they have been working with for decades (in red) and newer methods that they are currently exploring. The challenge is that each of these methods requires different knowledge and skills, and they all have different types of evaluation.

Although we have a slightly different research profile at UiO than at KTH, we also have a breadth of methodological approaches in SMC-related research. I pointed to a model I often use to explain what we are doing:

A simplified model of explaining my research approach.

The model has two axes. One shows a continuum between artistic and scientific research methods and outputs. Another is a continuum between performing research on natural and cultural phenomena. In addition, we develop and use various types of technologies for all of these.

The reason I like to bring up this model is to explain that things are connected. I often hear that artistic and scientific research are completely different things. Sure, they are different, but there are also commonalities. Similarly, there is an often unnecessary divide between the humanities and the social and natural sciences. True, they have different foci but when studying music we need to take all of these into account. Music involves everything from “low-level” sensory phenomena to “high-level” emotional responses. One can focus on one or the other, but if we really want to understand musical experiences – or make new ones for that matter – we need to see the whole picture. Thus, evaluations of whatever we do also need to have a holistic approach.

Open Research as a Tool for Rigorous Evaluation

My entry into the panel discussion was that we should use the ongoing transition to Open Research practices as an opportunity to also perform more rigorous evaluations. I have previously argued why I believe open research is better research. The main argument is that sharing things (methods, code, data, publications, etc.) openly forces researchers to document everything better. Nobody wants to make sloppy things publicly available. So the process of making all the different parts of the open research puzzle openly available is a critical component of a rigorous evaluation.

In the process of making everything open, we realize, for example, that we need better tools and systems. We also experience that we need to think more carefully about privacy and copyright. That is also part of the evaluation process and lays the ground for other researchers to scrutinize what we are doing.

Summing up

One of the challenges of discussing rigorous evaluation in the “field” of sound and music computing is that we are not talking about one discipline with one method. Instead, we are talking about a set of approaches to developing and using computational methods for sound and music signals and experiences. If you need to read that sentence a couple of times, it is understandable. Yes, we are combining a lot of different things. And, yes, we are coming from different backgrounds: the arts and humanities, the social and natural sciences, and engineering. That is exactly what is cool about this community. But it is also why it is challenging to agree on what a rigorous evaluation should be!

Sound and Music Computing at the University of Oslo

This year’s Sound and Music Computing (SMC) Conference has opened for virtual lab tours. When we cannot travel to visit each other, this is a great way to showcase how things look and what we are working on.

Stefano Fasciani and I teamed up a couple of weeks ago to walk around some of the labs and studios at the Department of Musicology and RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion. We started in the Portal used for the Music, Communication & Technology master’s programme and ended up in the fourMs Lab.

Needless to say, we only scratched the surface of everything going on in the field of sound and music computing at the University of Oslo in this video. The video focused primarily on our infrastructures. We have several ongoing projects that use these studios and labs and also some non-lab-based projects. This include:

And I should not forget to mention our exciting collaboration with partners in Copenhagen, Stockholm, Helsinki, and Reykjavik in the Nordic Sound and Music Computing network.

And, as we end the video, please don’t hesitate to get in touch if you want to visit us or collaborate on projects.


Splitting audio files in the terminal

I have recently played with AudioStellar, a great tool for “sound object”-based exploration and musicking. It reminds me of CataRT, a great tool for concatenative synthesis. I used CataRT quite a lot previously, for example, in the piece Transformation. However, after I switched to Ubuntu and PD instead of OSX and Max, CataRT was no longer an option. So I got very excited when I discovered AudioStellar some weeks ago. It is lightweight and cross-platform and has some novel features that I would like to explore more in the coming weeks.

Samples and sound objects

In today’s post, I will describe how to prepare short audio files to load into AudioStellar. The software is based on loading a collection of “samples”. I always find the term “sample” to be confusing. In digital signal processing terms, a sample is literally one sample, a number describing the signal’s amplitude in that specific moment in time. However, in music production, a “sample” is used to describe a fairly short sound file, often in the range of 0.5 to 5 seconds. This is what in the tradition of the composer-researcher Pierre Schaeffer would be called a sound object. So I prefer to use that term to refer to coherent, short snippets of sound.

AudioStellar relies on loading short sound files. They suggest that for the best experience, one should load files that are shorter than 3 seconds. I have some folders with such short sound files, but I have many more folders with longer recordings that contain multiple sound objects in one file. The beauty of CataRT was that it would analyse such long files and identify all the sound objects within the files. That is not possible in AudioStellar (yet, I hope). So I have to chop up the files myself. This can be done manually, of course, and I am sure some expensive software also does the job. But this was a good excuse to dive into SoX (Sound eXchange).

SoX for sound file processing

SoX is branded as “the Swiss Army knife of audio manipulation”. I have tried it a couple of times, but I usually rely on FFmpeg for basic conversion tasks. FFmpeg is mainly targeted at video applications, but it handles many audio-related tasks well. Converting from .AIFF to .WAV or compressing to .MP3 or .AAC can easily be handled in FFmpeg. There are even some basic audio visualization tools available in FFmpeg.

However, for some more specialized audio jobs, SoX come in handy. I find that the man pages are not very intuitive. There are also relatively few examples of its usage online, at least compared to the numerous FFmpeg examples. Then I was happy to find the nice blog of Mads Kjelgaard, who has written a short set of SoX tutorials. And it was the tutorial on how to remove silence from sound files that caught my attention.

Splitting sound files based on silence

The task is to chop up long sound files containing multiple sound objects. The description of SoX’s silence function is somewhat cryptic. In addition to the above mentioned blog post, I also came across another blog post with some more examples of how the SoX silence function works. And lo and behold, one of the example scripts managed to very nicely chop up one of my long sound files of bird sounds:

sox birds_in.aif birds_out.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

The result is a folder of short sound files, each containing a sound object. Note that I started with an .AIFF file but converted it to .WAV along the way since that is the preferred format of AudioStellar.

SoX managed to quickly split up a long sound file of bird chirps into individual files, each containing one sound object.

To scale this up a bit, I made a small script that will do the same thing on a folder of files:


for i in *.aif;
name=`echo $i | cut -d'.' -f1`;
sox "$i" "${name}.wav" silence 1 0.1 1% 1 0.1 1% : newfile : restart

And this managed to chop up 20 long sound files into approximately 2000 individual sound files.

The batch script split up 20 long sound files into approximately 2000 short sound files in just a few seconds.

There were some very short sound files and some very long. I could have tweaked the script a little to remove these. However, it was quicker to sort the files by file size and delete the smallest and largest files. That left me with around 1500 sound files to load into AudioStellar. More on that exploration later.

Loading 1500 animal sound objects into AudioStellar.

All in all, I was happy to (re)discover SoX and will explore it more in the future. I was happy to see that the above settings worked well for sound recordings with clear silence parts. Some initial testing of more complex sound recordings were not equally successful. So understanding more about how to tweak the settings will be important for future usage.

Analyzing a double stroke drum roll

Yesterday, PhD fellow Mojtaba Karbassi presented his research on impedance control in robotic drumming at RITMO. I will surely get back to discussing more of his research later. Today, I wanted to share the analysis of one of the videos he showed. Mojtaba is working on developing a robot that can play a double stroke drum roll. To explain what this is, he showed this video he had found online, made by John Wooton:

The double stroke roll is a standard technique for drummers, but not everyone manages to perform it as evenly as in this example. I was eager to have a look at the actions in a little more detail. We are currently beta-testing the next release of the Musical Gestures Toolbox for Python, so I thought this video would be a nice test case.

Motion video

I started the analysis by extracting the part of the video where he is showing the complete drum roll. Next, I generated a motion video of this segment:

This is already fascinating to look at. Since the background is removed, only the motion is visible. Obviously, the framerate of the video is not able to capture the speed that he plays with. I was therefore curious about the level of detail I could achieve in the further analysis.

Audio visualization

Before delving into the visualization of the video file, I made a spectrogram of the sound:

If you are used to looking at spectrograms, you can quite clearly see the change in frequency as the drummer is speeding up and then slowing down again. However, a tempogram of the audio is even clearer:

Here you can really see the change in both the frequency and the onset strength. The audio is sampled at a much higher frequency (44.1 kHz) than the video (25 fps). Is it possible to see some of the same effects in the motion?


I then moved on to create a motiongram of the video:

There are two problems with this motiongram. First, the recording is composed of alternating shots from two different camera angles. These changes between shots can clearly be seen in the motiongram (marked with Camera 1 and 2). Second, this horizontal motiongram only reveals the vertical motion in the video image. Since we are here averaging over each row in the image, the motiongram shows both the left and right-hand motion. For such a recording, it is, therefore, more relevant to look at the vertical motiongram, which shows the horizontal motion:

In this motiongram, we can more clearly see the patterns of each hand. Still, we have the problem of the alternating shots. If we “zoom” in on the part called Camera 2b, it is possible to see the evenness of the motion in the most rapid part:

I also find it fascinating to “zoom” in on the part called Camera 2c, which shows the gradual slow-down of motion:

Finally, let us consider the slowest part of the drum roll (Camera 1d):

Here it is possible to see the beauty of the double strokes very clearly.

How to work with plug-in-power microphones

I have never thought about how so-called plug-in-power microphones actually work. Over the years, I have used several of them for various applications, including small lavalier microphones for cameras and mobile phones. The nice thing about plug-and-play devices is that they are, well, plug and play. The challenge, however, is when they don’t work. Then it is time to figure out what is going on. This is the story of how I managed to use a Røde SmartLav+ lavalier microphone with a Zoom Q8 recorder.

Powered microphones

The Shure SM58 is a classic dynamic microphone, which doesn’t require any power to function.

When speaking about large (normal) microphones, we typically differentiate between dynamic and condenser. The dynamic microphones generally are used for singing and talking and don’t require any power. You can plug them into a mixer or sound card, and they will work. Dynamic microphones are very versatile and are great in that they often don’t lead to much feedback. The downside to this is that they don’t pick up soft sounds very well, so you need to speak/sing directly into them to get a good signal.

AKG condenser microphone with XLR cable.

Condenser microphones are much more sensitive and allow for picking up more details than dynamic ones. However, to make them work, condenser microphones need to be supplied with 48-volt power, often called phantom power. Most mixers and sound cards can serve phantom power over an XLR connection, so it is usually no problem to get a good signal from a condenser microphone. Since there is only one connection type (XLR) and one type of power (48 volts), things are relatively straightforward (yes, I know, there are some exceptions here, but it holds for most cases).

Lavalier microphones

As I have been doing more video recording and streaming over the last years, I have also gotten used to working with lavalier microphones. These are the tiny microphones you can place on your shirt to get a good sound quality when speaking on video. Over the years, I have been working with various microphones that come bundled as part of wireless packages. You have a transmitter to which you attach the microphone and a receiver to plug into your video camera. The transmitter and receiver run on batteries, but I have always thought that the power was used for the wireless transmission. Now I have learned that these microphones receive power from the transmitter. That is quite obvious when you think about it. After all, they pick up sound like a large condenser microphone. But I never really thought about them as powered microphones before.

I nowadays often use my mobile phone for quick and dirty video recordings. This works well for many things, and, as they say, the best camera is the one you bring with you. The sound, however, is less than optimal. I, therefore, wanted to use a lavalier microphone with my phone. Then the problems started.

It turns out that the world of lavalier microphones is much more complex than I would have imagined. To start with, there are numerous connectors for such microphones, including minijack plugs of different sizes (2.5 and 3.5 mm) and several rings (TS, TRS and TRRS), mini-XLR plugs with a different number of pins (TA-, TA-4, TA-5), in addition to Hirose, LEMO, and so on.

The Røde SmartLav+ lavalier microphone.

As I looked around my collection of lavalier microphones and the ones we have in the lab, none of them had a 3.5 mm minijack connector that I could plug straight into my phone (yes, I still have a minijack plug on my phone!). So I quickly gave up and looked around on the web. Many people recommended the Røde SmartLav+, so I decided to get one to try out.

I liked the SmartLav+ so much (a comparison with some other devices) that I bought another one, some extender cables, and a small adapter to connect two of them to my phone simultaneously. Voila, and I have a nice small kit for recording two people at a time. I have been using this to record many educational videos this last year, which has worked very well. So if you want a small, simple, and (comparably) cheap setup to improve audio on your mobile phone recordings, you should get something like this. I should say that I have no particular reason for recommending the Røde SmartLav+ over other ones. Now I see that many people also recommend Shure MVL, which is probably equally good.

Connecting the SmartLav+ to a GoPro camera

I had been using the SmartLav+ with my phone for a while when I decided to try it with a GoPro 8. With the MediaMod accessory, it is possible to connect a microphone with a minijack plug. But plugging in the SmartLav+ does not work. This was when I started thinking more about the fact that the SmartLav+ has a so-called TRRS plug (as opposed to TRS and TS plugs).

Differences between TS, TRS, and TRRS connectors (illustration from CableChick).

In many consumer products, these three types are used for mono signals, stereo signals, and headsets (mono microphone + stereo output), respectively (although things are not always that easy).

A common way of thinking about how the different plugs are used in consumer devices (illustration from CableChick).
The Røde SC3 TRRS-TRS adaptor is designed with a grey side. Practical!

To work with regular audio input (on the GoPro), the SmartLav+ signal needs to be converted from TRRS to TRS. Fortunately, there are adaptors for this, and it turned out that I had a few lying around in my office. I still decided to buy a Røde SC3 because it has a grey colour on the TRRS side, making it easier to see the connector type.

When I plugged in the microphone (with adaptor) to the GoPro, it worked nicely right out of the box. I, therefore, didn’t think much about the need to power the microphone. I have later learned from DC Rainmaker that the GoPro has a setting for choosing between different types of microphone inputs:

The settings available on the GoPro with a MediaMod.

The list above says that the GoPro defaults to non-powered mics, but my camera defaults to plug-in-power. They might have changed things along the way.

Connecting the SmartLav+ to a Zoom Q8 recorder

When I tried to connect the SmartLav+ to a Zoom Q8 recorder, I started having problems. First, I connected with a minijack-to-jack adaptor (with the TRRS-TRS adaptor in between). This resulted in no sound input on the Q8. I then switched to an XLR adaptor, but still no sound. I then took out a dynamic microphone to check that the Q8 input worked.

This was when I realized that SmartLav+ is actually a powered microphone. After reading up more on this and other lavalier microphones, I understand that I have had a big gap in my microphone knowledge. This is slightly embarrassing. After all, a professor of music technology should know about such things. To my excuse, perhaps, I would argue that lavalier microphones are not something that music technologists typically deal with. Most of the time, we work with large microphones and XLR cables. Lavalier microphones are typically used more for video recording and media production.

Embarrassments aside, I am primarily interested in finding a solution to my problem. How do I connect the SmartLav+, or any other powered minijack-microphone, with a sound recorder?

Solution 1

It turned out that Røde has a solution to the problem in the form of the minijack-to-XLR adapter VXLR+. This is not just a passive device converting from one to another (I already had some of those lying around). No, this one actually converts the 48-volt power coming from the XLR cable to the 2.7 volts required by the SmartLav+. To complicate things, the adapter takes a TRS minijack as input, so it is also necessary to add the TRRS-TRS adapter in between. So after hooking it all up and turning on phantom power, I now finally have a loud and clear sound on the Q8. The sound is not as good as with microphones like the DPA 4060, of course, but not bad for voice recordings.

One of the reasons I wanted to connect the SmartLav+ to the Zoom recorder in the first place was to have a simple and portable setup for recording conversations with multiple people (4-8). Of course, I could set up an omnidirectional microphone or a stereo pair, but that wouldn’t give the type of intimate sound that I am looking for. I could always set up many large microphones on stands in the lab, but that is not a very portable solution. So I was thinking about possibly connecting multiple lavalier microphones to a multichannel sound recorder instead. Now, I have found that this could work well. For example, a Zoom H8 with many lavalier microphones could be a portable setup. However while searching for such a setup, a different solution came to my attention.

Solution 2

Given that more and more people are using lavalier microphones these days, I was curious about the market for minijack-based mixers. Strangely enough, there aren’t many around and none from the big manufacturers. But one mixer seemed to pop up in various webshops: Just MIC IV by Maker Hart. It features four minijack inputs, and, most importantly, it can provide power to the microphones, and it can give both 48v and 1.5v.

The Just MIC IV is a small mini mixer for minijack-based microphones.

This mixer looked like the perfect solution for my needs, so I decided to try it. After playing with it for a little while, I have found it almost exactly what I need. The functionality is excellent. It supplies power to the microphones, and they should ideally get 2.7v, but the 1.5v provided from the mixer seems to work fine. The panning is a rudimentary left-middle-right switch, which is not ideal but can place people in a stereo image. It only has a 2-channel output, so there is no multichannel recording here. But it will suffice for quick recordings of four people.

The biggest problem with the Just MIC IV is that it picks up electric disturbances very easily. I often get an annoying buzz when connecting it to a wall socket. So I have ended up running it from a USB battery pack instead. Not ideal, but better than nothing.


After searching and testing, I now know a lot more about lavalier microphones, different minijack configurations, and interfacing possibilities. I still do not have an optimal solution for my needs but am getting closer. Given that so many people are getting into sound recording these days, from podcasts to teaching, I think there is a potential market here for easy to use solutions. Products like the SmartLav+ have made making good audio recordings on a mobile phone much more effortless. I wish there were a decent small and simple mixer for such microphones. The Just MIC IV is almost there but is too noisy. Any company out there that can make a small, solid, high-quality 8-channel mini-mixer?

Update 22 February 2022: A blog reader told me that the Griffin iMic has plug-in power. Other such “dongle soundcards” may do the same, although I cannot point to any other specific brands. This is interesting for those who want to use such microphones with laptops that do not provide plug-in power through the minijack port.