Analyzing a double stroke drum roll

Yesterday, PhD fellow Mojtaba Karbassi presented his research on impedance control in robotic drumming at RITMO. I will surely get back to discussing more of his research later. Today, I wanted to share the analysis of one of the videos he showed. Mojtaba is working on developing a robot that can play a double stroke drum roll. To explain what this is, he showed this video he had found online, made by John Wooton:

The double stroke roll is a standard technique for drummers, but not everyone manages to perform it as evenly as in this example. I was eager to have a look at the actions in a little more detail. We are currently beta-testing the next release of the Musical Gestures Toolbox for Python, so I thought this video would be a nice test case.

Motion video

I started the analysis by extracting the part of the video where he is showing the complete drum roll. Next, I generated a motion video of this segment:

This is already fascinating to look at. Since the background is removed, only the motion is visible. Obviously, the framerate of the video is not able to capture the speed that he plays with. I was therefore curious about the level of detail I could achieve in the further analysis.

Audio visualization

Before delving into the visualization of the video file, I made a spectrogram of the sound:

If you are used to looking at spectrograms, you can quite clearly see the change in frequency as the drummer is speeding up and then slowing down again. However, a tempogram of the audio is even clearer:

Here you can really see the change in both the frequency and the onset strength. The audio is sampled at a much higher frequency (44.1 kHz) than the video (25 fps). Is it possible to see some of the same effects in the motion?


I then moved on to create a motiongram of the video:

There are two problems with this motiongram. First, the recording is composed of alternating shots from two different camera angles. These changes between shots can clearly be seen in the motiongram (marked with Camera 1 and 2). Second, this horizontal motiongram only reveals the vertical motion in the video image. Since we are here averaging over each row in the image, the motiongram shows both the left and right-hand motion. For such a recording, it is, therefore, more relevant to look at the vertical motiongram, which shows the horizontal motion:

In this motiongram, we can more clearly see the patterns of each hand. Still, we have the problem of the alternating shots. If we “zoom” in on the part called Camera 2b, it is possible to see the evenness of the motion in the most rapid part:

I also find it fascinating to “zoom” in on the part called Camera 2c, which shows the gradual slow-down of motion:

Finally, let us consider the slowest part of the drum roll (Camera 1d):

Here it is possible to see the beauty of the double strokes very clearly.

Convert between video containers with FFmpeg

In my ever-growing collection of smart FFmpeg tricks, here is a way of converting from one container format to another. Here I will convert from a QuickTime (.mov) file to a standard MPEG-4 (.mp4), but the recipe should work between other formats too.

If you came here to just see the solution, here you go:

ffmpeg -i -acodec copy -vcodec copy outfile.mp4

In the following I will explain everything in a little more detail.

Container formats

One of the confusing things about video files is that they have both a container and a compression format. The container is often what denotes the file suffix. Apple introduced the .mov format for QuickTime files and Microsoft used to use .avi files.

Nowadays, there seems to a converge towards using MPEG containers and .mp4 files. However, both Apple and Microsoft software (and others) still output other formats. This is confusing and can also lead to various playback issues. For example, many web browsers are not able to play these formats natively.

Compression formats

The compression format denotes how the video data is organized on the inside of a container. Also, here there are many different formats. The most common today is to use the H.264 format for video and AAC for audio. These are both parts of the MPEG-4 standard and can be embedded in .mp4 containers. However, both H.264 and AAC can also be embedded in other containers, such as .mov and .avi files.

The important thing to notice is that both .mov and .avi files may contain H.264 video and AAC audio. In those cases, the inside of such files is identical to the content of a .mp4 file. But since the container is different, it may still be unplayable in certain software. That is why I would like to convert from one container format to another. In practice that means converting from .mov or .avi to .mp4 files.

Lossless conversion

There are many ways of converting video files. In most cases, you would end up with a lossy conversion. That means that the video content will be altered. The file size may be smaller, but the quality may also be worse. The general rule is that you want to compress a file as few times as possible.

For all sorts of video conversion/compression jobs, I have ended up turning to FFmpeg. If you haven’t tried it already, FFmpeg is a collection of tools for doing all sorts of audio/video manipulations in the terminal. Working in the terminal may be intimidating at first, but you will never look back once you get the hang of it.

Converting a file from .mov to .mp4 is as simple as typing this little command in a terminal:

ffmpeg -i outfile.mp4

This will change from a .mov container to a .mp4 container, which is what we want. But it will also (probably) re-compress the video. That is why it is always smart to look at the content of your original file before converting it. You can do this by typing:

ffmpeg -i

For my example file, this returns the following metadata:

    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2016-08-10T10:47:30.000000Z Apple MacBookPro11,1 Mac OS X 10.11.6 (15G31) 2016-08-10T12:45:43+0200
  Duration: 00:00:12.76, start: 0.000000, bitrate: 5780 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k tbn, 50 tbc (default)
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Video
      encoder         : H.264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Audio

There is quite a lot of information there, so we need to look for the important stuff. The first line we want to look for is the one with information about the video content:

Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k     

Here we can see that this .mov file contains a video that is already compressed with H.264. Another thing we can see here is that it is using a weird pixel format (1844×1160). The bit rate of the file is 5243 kb/s, which tells something about how large the file will be in the end. And it is also interesting to see that it is using a framerate of 58.66 fps, which is also a bit odd.

Similarly, we can look at the content of the audio stream of the file:

Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)

Here we can see that the audio is already compressed with AAC at a standard sampling rate of 44.1 kHz and at a more nonstandard bit rate of 269 kb/s.

The main point of investigating the file before we do the conversion is to avoid re-compressing the content of the file. After all, the content is already in the right formats (H.264 and AAC) even though it is in an unwanted container (.mov).

Today’s little trick is how to convert from one format to another without modifying the content of the file, only the container. That can be achieved with the code shown on top:

ffmpeg -i -acodec copy -vcodec copy outfile.mp4

There are several benefits of doing it this way:

  1. Quality. Avoiding an unnecessary re-compression of the content, which would only degrade the content.
  2. Preserve the pixel size, sampling rates, etc. of the originals. Most video software will use standard settings for these. I often work with various types of non-standard video files, so it is nice to preserve this information.
  3. Save time. Since no re-compression is needed, we only copy content from one container to another. This is much, much faster than re-compressing the content.

All in all, this long explanation of a short command may help to improve your workflows and save some time.

How to work with plug-in-power microphones

I have never thought about how so-called plug-in-power microphones actually work. Over the years, I have used several of them for various applications, including small lavalier microphones for cameras and mobile phones. The nice thing about plug-and-play devices is that they are, well, plug and play. The challenge, however, is when they don’t work. Then it is time to figure out what is actually going on. This is the story of how I managed to use a Røde SmartLav+ lavalier microphone with a Zoom Q8 recorder.

Powered microphones

The Shure SM58 is a classic dynamic microphone, which doesn’t require any power to function.

When speaking about large (normal) microphones, we typically differentiate between two types: dynamic and condenser. The dynamic microphones are typically used for singing and talking and don’t require any power. You can plug them into a mixer or sound card, and they will work. Dynamic microphones are very versatile and are great in that they often don’t lead to much feedback. The downside to this is that they don’t pick up soft sounds very well, so you need to speak/sing quite loudly directly into them to get a good signal.

AKG condenser microphone with XLR cable.

Condenser microphones are much more sensitive and allow for picking up more details than dynamic ones. However, to make them work, condenser microphones need to be supplied with 48-volt power, often called phantom power. Most mixers and sound cards have the ability to serve phantom power over an XLR connection, so it is usually no problem to get a good signal from a condenser microphone. Since there is only one connection type (XLR) and one type of power (48 volts) things are fairly straight forward (yes, I know, there are some exceptions here, but it holds for most cases).

Lavalier microphones

As I have been doing more video recording and streaming over the last years, I have also gotten used to working with lavalier microphones. These are the tiny microphones you can place on your shirt to get a good sound quality when speaking on video. Over the years, I have been working with various microphones that come bundled as part of wireless packages. You have a transmitter to which you attach the microphone and a receiver to plug into your video camera. The transmitter and receiver run on batteries, but I have always thought that the power was used for the wireless transmission. Now I have learned that these microphones receive power from the transmitter. That is quite obvious when you think about it. After all, they pick up sound like a large condenser microphone. But I never really thought about them as powered microphones before.

I nowadays often use my mobile phone for quick and dirty video recordings. This works well for many things, and, as they say, the best camera is the one you bring with you. The sound, however, is less than optimal. I, therefore, wanted to use a lavalier microphone with my phone. Then the problems started.

It turns out that the world of lavalier microphones is much more complex than I would have imagined. To start with, there are numerous connectors for such microphones, including minijack plugs of different sizes (2.5 and 3.5 mm) and a number of rings (TS, TRS and TRRS), mini-XLR plugs with a different number of pins (TA-, TA-4, TA-5), in addition to Hirose, LEMO, and so on.

The Røde SmartLav+ lavalier microphone.

As I looked around the collection of my own lavalier microphones and also the ones we have in the lab, none of them had a 3.5 mm minijack connector that I could plug straight into my phone (yes, I still have a minijack plug on my phone!). So I quickly gave up and looked around on the web. Many people recommended the Røde SmartLav+, so I decided to get one to try out.

I liked the SmartLav+ so much (a comparison with some other devices) that I bought another one, some extender cables, and a small adapter to connect two of them to my phone simultaneously. Voila, and I have a nice and small kit for recording two people at a time. I have been using this to record many educational videos this last year, and it has worked very well. So if you want a small, simple, and (comparably) cheap setup to improve audio on your mobile phone recordings, you should get something like this. I should say that I have no particular reason for recommending the Røde SmartLav+ over other ones. Now I see that many people also recommend Shure MVL, which is probably equally good.

Connecting the SmartLav+ to a GoPro camera

I had been using the SmartLav+ with my phone for a while when I decided to try it with a GoPro 8. With the MediaMod accessory, it is possible to connect a microphone with a minijack plug. But plugging in the SmartLav+ does not work. This was when I started thinking more about the fact that the SmartLav+ has a so-called TRRS plug (as opposed to TRS and TS plugs).

Differences between TS, TRS, and TRRS connectors.

In many consumer products, these three types are used for mono signals, stereo signals, and headsets (mono microphone + stereo output), respectively (although things are not always that easy).

A common way of thinking about how the different plugs are used in consumer devices.
The Røde SC3 TRRS-TRS adaptor is designed with a grey side. Practical!

To work with regular audio input (on the GoPro) the SmartLav+ signal needs to be converted from TRRS to TRS. Fortunately, there are adaptors for this, and it turned out that I had a few lying around in my office. I still decided to buy a Røde SC3 because it has a grey colour on the TRRS side, making it easier to see the connector type.

When I plugged in the microphone (with adaptor) to the GoPro, it worked nicely right out of the box. I, therefore, didn’t think much about the need to power the microphone. I have later learned from DC Rainmaker that the GoPro actually has a setting for choosing between different types of microphone inputs:

The settings available on the GoPro with a MediaMod.

The list above says that the GoPro defaults to non-powered mics, but my camera defaults to plug-in-power. They might have changed things along the way.

Connecting the SmartLav+ to a Zoom Q8 recorder

When I tried to connect the SmartLav+ to a Zoom Q8 recorder, I started having problems. First, I connected with a minijack-to-jack adaptor (with the TRRS-TRS adaptor in between). This resulted in no sound input on the Q8. I then switched to an XLR adaptor, but still no sound. I then took out a dynamic microphone to check that the Q8 input actually worked.

This was when I realized that SmartLav+ is actually a powered microphone. After reading up more on this and other lavalier microphones, I understand that I have had a big gap in my microphone knowledge. This is slightly embarrassing. After all, a professor of music technology should know about such things. To my excuse, perhaps, I would argue that lavalier microphones are not something that music technologists typically deal with. Most of the time, we work with large microphones and XLR cables. Such small microphones are typically used more for video recording and media production.

Embarrassments aside, I am primarily interested in finding a solution to my problem. How do I connect the SmartLav+, or any other powered minijack-microphone, with a sound recorder?

Solution 1

It turned out that Røde actually has a solution to the problem in the form of the minijack-to-XLR adapter VXLR+. This is not just a passive device converting from one to the other (I already had some of those lying around). No, this one actually converts the 48-volt power coming from the XLR cable to the 2.7 volts required by the SmartLav+. To complicate things, though, the adapter takes a TRS minijack as input, so it is also necessary to add the TRRS-TRS adapter in between. So after hooking it all up, and turning on phantom power, I now finally have a loud and clear sound on the Q8. The sound is not as good as with microphones like the DPA 4060, of course, but not bad for voice recordings.

One of the reasons I wanted to connect the SmartLav+ to the Zoom recorder in the first place, was to have a simple and portable setup for recording conversations with multiple people (4-8). Of course, I could set up an omnidirectional microphone or a stereo pair, but that wouldn’t give the type of intimate sound that I am looking for. In the lab, I could always set up many large microphones on stands, but that is not a very portable solution. So I was thinking about possibly connecting multiple lavalier microphones to a multichannel sound recorder instead. Now, I have found that this could actually work well. For example, a Zoom H8 with many lavalier microphones could be a nice and portable setup. While searching for such a setup, however, a different solution came to my attention.

Solution 2

Given that more and more people are using lavalier microphones these days, I was curious about the market for minijack-based mixers. Strangely enough, there aren’t many around and none from the big manufacturers. But one mixer seemed to pop up in various webshops: Just MIC IV by Maker Hart. It features four minijack inputs, and, most importantly, it can provide power to the microphones. In fact, it can provide both 48v and 1.5v.

The Just MIC IV is a small mini mixer for minijack-based microphones.

This mixer looked like the perfect solution for my needs, so I decided to give it a try. After playing with it for a little while, I have found it to be almost exactly what I need. The functionality is great. It supplies power to the microphones. They should ideally get 2.7v, but the 1.5v supplied from the mixer seems to work fine. The panning is a rudimentary left-middle-right switch, which is not ideal but can place people in a stereo image. It only has a 2-channel output, so no multi-channel recording here. But it will suffice for quick recordings of four people.

The biggest problem with the Just MIC IV is that it picks up electric disturbances very easily. I often get an annoying buzz when connecting it to a wall socket. So I have ended up running it from a USB battery pack instead. Not ideal, but better than nothing.


After a lot of searching and testing I now know a lot more about lavalier microphones, different minijack congifurations, and interfacing possibilities. I still do not have an optimal solution for my needs, but am getting closer. Given that so many people are getting into sound recording these days, from podcasts to teaching, I think there is a potential market here for easy to use solutions. Products like the SmartLav+ has made it much easier to make good audio recordings on a mobile phone. I wish there was a decent small and simple mixer for such microphones. The Just MIC IV is almost there, but is too noisy. Any company out there that can make a small, solid, high-quality 8-channel mini-mixer?

New paper: Who Moves to Music? Empathic Concern Predicts Spontaneous Movement Responses to Rhythm and Music

A few days after Agata Zelechowska defended her PhD dissertation, we got the news that her last paper was finally published in Music & Science. It is titled Who Moves to Music? Empathic Concern Predicts Spontaneous Movement Responses to Rhythm and Music and was co-authored by Victor Gonzalez Sanchez, Bruno Laeng, Jonna Vuoskoski, and myself.

The paper is based on Agata’s headphones-speakers experiment. We have previously published a paper showing that people move more when listening on headphones. This, however, the focus was on the data gathered on individual differences. Many variables were tested, but it was only empathic concern that turned out to be a motion predictor.

Here is a short video teaser about the article:

And here is the abstract:

Moving to music is a universal human phenomenon, and previous studies have shown that people move to music even when they try to stand still. However, are there individual differences when it comes to how much people spontaneously respond to music with body movement? This article reports on a motion capture study in which 34 participants were asked to stand in a neutral position while listening to short excerpts of rhythmic stimuli and electronic dance music. We explore whether personality and empathy measures, as well as different aspects of music-related behaviour and preferences, can predict the amount of spontaneous movement of the participants. Individual differences were measured using a set of questionnaires: Big Five Inventory, Interpersonal Reactivity Index, and Barcelona Music Reward Questionnaire. Liking ratings for the stimuli were also collected. The regression analyses show that Empathic Concern is a significant predictor of the observed spontaneous movement. We also found a relationship between empathy and the participants’ self-reported tendency to move to music.

And the full reference is:

Zelechowska, A., Gonzalez Sanchez, V. E., Laeng, B., Vuoskoski, J. K., & Jensenius, A. R. (2020). Who Moves to Music? Empathic Concern Predicts Spontaneous Movement Responses to Rhythm and Music. Music & Science, 3, 2059204320974216.

The different parts of the experiment (from left to right): preparation, first listening session, first set of questionnaires, second listening session, second set of questionnaires.

We are working on preparing the data of the experiment for sharing. It will be shared as part of the Oslo Standstill Database.

Running a Hybrid Disputation

Yesterday, I wrote about Agata Zelechowska’s disputation. We decided to run it as a hybrid production, even though there was no audience present. It would, of course, have been easier to run it as an online-only event. However, we expect that hybrid is the new “normal” for such events, and therefore thought that it would be good to get started exploring the hybrid format right away. In this blog post, I will write up some of our experiences.

The setup in the hall, with the candidate presenting to the left and the disputation leader to the right.

The disputation was run in Forsamlingssalen, a nice lecture room in Harald Schjelderups hus, where RITMO is located. We had seen the need for recording lectures in the hall and had even before corona installed two PTZ cameras and a video mixer in the hall. This setup was primarily intended for recording lectures and secondary for streaming on YouTube. We actually never got around to use the new system before corona closed down the university in the spring. So the disputation was a good chance for getting the system up and running.

Forsamlingssalen, seen from the lecture podium. One PTZ camera is placed on the left wall and one in the back next to the projector. LED lights help illuminate the speakers.

What turned out to be the most challenging part of the setup was to figure out how to interface the system with Zoom. We quickly decided to use a Zoom Webinar instead of a Zoom Room. The Webinar solution is better for public events where you want to control the “production”. It is also safer since only invited panellists are allowed to show their camera and to speak.

Zoom (both the Webinar solution and regular Rooms) is in many ways a small video production tool in its own right. However, we realised that it is quite challenging to use in a more traditional multi-camera setup. Its strength is on allowing multiple people with single-camera setups to interact. We did make it work in the end, but it was quite a puzzle to get right.

People and PCs

There were four people visible in the disputation: the candidate and disputation leader were present in Forsamlingssalen and the two opponents that joined remotely. The two opponents used the normal Zoom client on their PCs, so their part was easy enough (of course, we ensured that they had good audio and video quality).

The second opponent projected on the wall during the disputation.

For the setup in Forsamlingssalen, the candidate was standing at the podium with a desktop PC with two screens and cabled ethernet. Her presentation was shown on the right screen and her notes on the left. We played with the idea of showing her presentation as a video stream through the video mixer but ended up using Zoom’s screen sharing function. That meant that we had to run Zoom on the PC, and then we could also add her image from a web camera sitting on top of the left screen.

The speaker desk, with two screens, a desk microphone, and a web camera above the screen to the left.

The image from that PC is also what goes to the projected screen in the hall. That was not so important now, since there was no audience. But for future hybrid disputations (and other events) we need to also think about people in the hall.

The hall has a nice microphone setup, with a swan neck microphone next to the PC screens and several wireless microphones. We ended up equipping both the candidate and disputation leader with wireless clip-on mics to ensure that they had a good sound coming through. The mixed microphone signal was then fed into the lecture PC to be shared through Zoom.

Two wireless microphones were used, one for the candidate and the other for the disputation leader. These microphones were connected to the PA system in the hall and sent as inputs to the desk PC to be included in the Zoom stream.
The disputation leader only had an empty lecturer desk. His image was captured from one of the PTZ cameras in the back of the hall, and his sound was captured through a clip-on mic that was connected to the sound mixer.


The cabling in the room is set up so that there is a combined HDMI signal being sent from the podium to the video mixer. This signal contains the main image from the PC, which is also projected on the wall. It also contains the combined audio signal coming from the microphones and the PC, played over the hall’s loudspeakers. As such, we can easily tap the same signal that an audience in the hall will experience. Also, the two PTZ cameras’ signals in the hall go into separate channels on the video mixer. Below is a sketch of the cabling in the room.

The AV routing in the hall.

The original plan was to run productions from the video mixer, which can stream directly to various servers and also record on an SD card. Since we used Zoom for the disputation, however, we hooked up a separate PC in the control room, to which we fed the mixed video signal through a video grabber. This PC was then connected to Zoom, and we could switch between the two PTZ cameras in the live stream.

Research assistant Aleksander Tidemann controlling the two PTZ cameras, video mixer, and Zoom PC in the control room.

Lessons learned

Many things worked well, but we also learned some lessons.

  1. Running a hybrid event is (much) more difficult than doing a physical or online-only event. It is challenging to create something that works well both in the room and with an online audience.
  2. Having good audio is imperative. This is particularly tricky in a hybrid situation, in which you can easily get into feedback problems. Fortunately, we have a very robust PA system available with high-quality clip-on microphones.
  3. Combining Zoom with a multi-camera production pipeline is challenging. Zoom is good for connecting multiple people with one PC+camera+microphone each. Adding in a multi-camera video in one Zoom channel worked, but it is difficult to mix video for a window size you don’t know. Zoom, the viewer can choose the size and position of windows. Doing picture-in-picture on the video mixer, for example, may lead to video images that are too small to watch if the viewer is using “gallery” mode.
  4. It is challenging to be the main “producer” of a hybrid Zoom event. I have run many Zoom Room meetings and also several Webinars. But this was the first time I tried a large-scale Webinar event with a hybrid setup. It worked ok in the end, but it was tricky when I did not have access to the necessary “tools” to actually control what was going on. For example, as a host, you are allowed to turn people’s video and audio off. But you cannot turn them on again. I had originally planned to turn the camera and microphone on the lecture podium off during breaks, and then turn them back on again when we started each session. The candidate should not have to think about such things, but this meant that I had to physically go over to the machine to turn things on when we should start. That also meant that I had to sit in the hall because it would be too far to get to the control room one floor up.
A view of the hall from the control room, one floor up from the hall.

I think the final result was fine, as can be seen in a recording of the event:

In the future, however, I would probably not run such an event as a Zoom Webinar. Given that we have a nice multi-camera setup in the hall, it would be better to run it as a regular video stream. Then we would have full control over the production. The candidate could stand at the podium and focus on giving her presentation, and we could mix the audio and video in the back.

However, the challenging part with such a setup would be to figure out how to best add in the opponents in the mix. I would probably opt to connect on a separate PC (trough a Zoom Room) that would be shown separately from the presentation. Exactly how to do that will be an experiment for our next disputation!