Kayaking motion analysis

Like many others, I bought a kayak during the pandemic, and I have had many nice trips in the Oslo fiord over the last year. Working at RITMO, I think a lot about rhythm these days, and the rhythmic nature of kayaking made me curious to investigate the pattern a little more.

Capturing kayaking motion

My spontaneous investigations into kayak motion began with simply recording a short video of myself kayaking. This was done by placing an action camera (a GoPro Hero 8, to be precise) on my life vest. The result looks like this:

In the future, it would be interesting to also test with a proper motion capture system (see this article for an overview of different approaches). However, as they say, the best motion capture system is the one you have at hand, and cameras are by far the easiest one to bring around.

Analysing kayaking motion

For the analysis, I reached for the Musical Gestures Toolbox for Python. It has matured nicely over the last year and is also where we are putting in most new development efforts these days.

The first step of motion analysis is to generate a motion video:

From the motion video, MGT will also create a motiongram:

Motiongram of a kayaking video.

From the motiongram, it is pretty easy to see the regularity of the kayaking strokes. This may be even easier from the videogram:

Videogram of a kayaking video.

We also get information about the centroid and quantity of motion:

Centroid and quantity of motion of the kayaking video.

The quantity of motion can be used for further statistical analysis. But for now, I am more interested in exploring how it is possible to better visualise the rhythmic properties of the video itself. It was already on the list to implement directograms in MGT, and this is even higher on the list now.

The motion average image (generated from the motion video) does not reveal much about the motion.

Motion average image of the kayaking video.

It is generated by calculating the average of all the frames. What is puzzling is the colour artefacts. I wonder whether that is coming from some compression error in the video or a bug somewhere in MGT for Python. I cannot see the same artefacts in the average image:

Average image of the kayaking video.

Analysing the sound of kayaking

The video recording also has sound, so I was curious to see if this could be used for anything. True, kayaking is a quiet activity, so I didn’t have very high hopes. Also, GoPros don’t have particularly good microphones, and they compress the sound a lot. Still, there could be something in the signal. To begin with, the waveform display of the sound does not tell that much:

A waveform of the sound of kayaking.

The spectrogram does not reveal that much either, although it is interesting to see the effects of the sound compression done by the GoPro (the horizontal lines from 5k and upward).

A spectrogram of the sound of kayaking.

Then the tempogram is more interesting.

A tempogram of the sound of kayaking.

It is exciting to see that it estimates the tempo to be 122 BPM, and this resonates with theories about 120 BPM being the average tempo of moderate human activity.

This little investigation into the sound and video of kayaking made me curious about what else can be found from such recordings. In particular, I will continue to explore approaches to analysing the rhythm of audiovisual recordings. It also made me look forward to a new kayaking season!

Releasing the Musical Gestures Toolbox for Python

After several years in the making, we finally “released” the Musical Gestures Toolbox for Python at the NordicSMC Conference this week. The toolbox is a collection of modules targeted at researchers working with video recordings.

Below is a short video in which Bálint Laczkó and I briefly describe the toolbox:

About MGT for Python

The Musical Gestures Toolbox for Python includes video visualization techniques such as creating motion videos, motion history images, and motiongrams. These visualizations allow for studying video recordings from different temporal and spatial perspectives. The toolbox also includes basic computer vision methods, and it is designed to integrate well with audio analysis toolboxes.

It is possible to run the toolbox from the terminal:

ipython example
Example of running MGT for Python in a terminal.

Many people would probably prefer to run it in a Jupyter notebook:

Screenshots from the example Jupyter Notebook.

The MGT was initially developed to analyze music-related body motion (of musicians, dancers, and perceivers) but is equally helpful for other disciplines working with video recordings of humans, such as linguistics, pedagogy, psychology, and medicine.


This toolbox builds on the Musical Gestures Toolbox for Matlab, which again builds on the Musical Gestures Toolbox for Max. The latest version was primarily developed by Bálint Laczkó, Frida Furmyr, and Marcus Widmer.

Read more

To learn more about Musical Gestures Toolbox for Python, take a look at our paper presented at NordicSMC:

Converting a .WAV file to .AVI

Sometimes, there is a need to convert an audio file into a blank video file with an audio track. This can be useful if you are on a system that does not have a dedicated audio player but a video player (yes, rare, but I work with odd technologies…). Here is a quick recipe

FFmpeg to the rescue

When it comes to converting from one media format to another, I always turn to FFmpeg. It requires “coding” in the terminal, but usually, it is only necessary to write a oneliner. When it comes to converting an audio file (say in .WAV format) to a blank video file (for example, a .AVI file), this is how I would do it:

ffmpeg -i infile.wav -c copy outfile.avi

The “-c copy” part of this command is to preserve the original audio content. The new black video file will have a copy of the original .WAV file content. If you are okay with compressing the audio, you can instead run this command:

ffmpeg -i infile.wav outfile.avi

Then FFmpeg will (by default) compress the audio using the mp3 algorithm. This may or may not be what you are after, but it will at least create a substantially smaller output file.

Of course, you can easily vary the above conversion. For example, if you want to go from .AIFF to .MP4, you would just do:

ffmpeg -i infile.aiff outfile.mp4

Happy converting!

Normalize audio in video files

We are organizing the Rhythm Production and Perception Workshop at RITMO next week. As mentioned in another blog post, we have asked presenters to send us pre-recorded videos. They are all available on the workshop page.

During the workshop, we will play sets of videos in sequence. When doing a test run today, we discovered that the sound levels differed wildly between files. There is clearly the need for normalizing the sound levels to create a good listener experience.

Batch normalization

How does one normalize around 100 video files without too much pain and effort? As always, I turn to my go-to video companion, FFmpeg. Here is a small script I made to do the job:


shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
   name=`echo $i | cut -d'.' -f1`; 
   ffmpeg -i "$i" -c:v copy -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 

This was the result of some searching around for a smart solution (in Qwant, btw, my new preferred search engine). For example, I use the “nullglob” trick to list multiple file types in the for loop.

The most important part of the script is the normalization, which I found in this blog post. The settings are described as:

  • loudnorm: the name of the normalization filter
  • I: the integrated loudness (from -70 to -5.0)
  • LRA: the loudness range (from 1.0 to 20.0)
  • TP: Indicates the max true peak (from -9.0 to 0.0)

The settings in the script normalize to a high but not maximum signal, which leaves some headroom.

To compress or not

To save processing time and avoid recompressing the video, I have included “-c:v copy” in the script above. Then FFmpeg copies over the video content directly. This is fine for videos with “normal” H.264 compression, which is the case for most .MP4 files. However, when getting 100 files made on all sorts of platforms, there are surely some oddities. There were a couple of cases with weird compression formats, that for some reason failed with the above script. One also had interlacing issues. For them, I modified the script to recompress the files.


shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
    name=`echo $i | cut -d'.' -f1`; 
    ffmpeg -i "$i" -vf yadif -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 

In this script, the copy part is removed. I have also added “-vf yadif”, which is a de-interlacing video filter.

Summing up

With the first script, I managed to normalize all 100 files in only a few minutes. Some of the files turned up with 0 bytes due to issues with copying the video data. So I ran through these with the second script. That took longer, of course, due to the need for compressing the video.

All in all, the processing took around half an hour. I cannot even imagine how long it would have taken to do this manually in a video editor. I haven’t really thought about the need for normalizing the audio in videos like this before. Next time I will do it right away!

Combining audio and video files with FFmpeg

When working with various types of video analysis, I often end up with video files without audio. So I need to add the audio track by copying either from the source video file or from a separate audio file. There are many ways of doing this. Many people would probably reach for a video editor, but the problem is that you would most likely end up recompressing both the audio and video file. A better solution is to use FFmpeg, the swizz-army knife of video processing.

As long as you know that the audio and video files you want to combine are the same duration, this is an easy task. Say that you have two video files:

  • input1.mp4 = original video with audio
  • input2.avi = analysis video without audio

Then you can use this one-liner to copy the audio from one file to the other:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest output.avi

The output.avi file will have the same video content as input2.avi, but with audio from input1.mp4. Note that this is a lossless (and fast) procedure, it will just copy the content from the source files.

If you want to convert (and compress) the file in one operation, you can use this one-liner to export an MP4 file with .h264 video and aac audio compression:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest -c:v mpeg4 -c:a aac output.mp4

Since this involves compressing the file, it will take (much longer) than the first method.