Creating different types of keyframe displays with FFmpeg

In some recent posts I have explored the creation of motiongrams and average images, multi-exposure displays, and image masks. In this blog post I will explore different ways of generating keyframe displays using the very handy command line tool FFmpeg.

As in the previous posts, I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first attempt is to create a 3×3 grid image by just sampling frames from the original image. I spent some time exploring different ways of doing this. It is possible to do it with a one-liner:

ffmpeg -ss 00:00:05 -i dance.mp4 -frames 1 -vf "select=not(mod(n\,200)),scale=495:256,tile=3x3" tile.jpg

The problem with this approach, and many similar that I found by googling around, is that it samples frames with a specific interval. In the above code it looks up every 200th frame, which gives this image:

The problem is that the image only contains information about the 1600 first frames, or more specifically frames 0, 200, 400, 600, 800, 1000, 1200, 1400, 1600. I want to include frames that represent the whole video.

I see that many people create such displays by sampling based on scene changes in the video. There are two problems with this. First, it requires that there are scene changes in the video. This is usually not the case in the videos that I study, which are primarily recorded with a stationary camera in which only the “foreground” changes. The second problem with sampling one “salient” frames, is that we loose information about the temporal unfolding of the video file. From an analysis point of view, it is actually quite useful to know more or less when things happened in the video. That is not so easy if the sampling is uneven.

I was therefore happy to find a nice script made by Martin Sikora, which is based on looking up the duration of the file and use this to calculate the frames to export from the file. Running this script on the original video gives this image:

The 9 frames in the display above reveal that there is little dance in the first one third of the video file (can see the arm of the dancer enter in the third image). It also shows how the dancer moved around in the space. It is possible to get some idea about her spatial distribution, but there is little information about her actual motion throughout the sequence. I was therefore curious to try out making such a grid-based display from a history video, which actually shows some more of the actual motion.

It is possible to make (motion) history videos in both the Matlab and Python versions of the Musical Gestures Toolbox, but today I was curious as to whether it could be done simply with FFmpeg. And it turns out to be quite simple using a filter called tmix:

ffmpeg -i dance.mp4 -filter:v tmix=frames=30:weights="10 1 1" dance_tmix30.mp4

I played around for a while with the settings before ending up with these ones. Here I average over 30 frames (which is half a second for this 60fps video). I also use weight feature to give preference to the current frame. This makes it easier to follow the dancer, as the trajectories of past motion become more blurred.

Running the above grid-script on this video results in a keyframe display that shows more of the motion happening in the frames in question. This is useful to see, for example, when she moved more than in other frames.

I am quite happy with the above-mentioned, but it is not particularly fast. Creating the history video is time-consuming, since it has to process all the frames in the entire video. I therefore tested speeding up the video 8 times, using this command (the -an flag is used to remove the audio):

ffmpeg -i dance.mp4 -filter:v "setpts=0.125*PTS" -an output8x.mp4

Running the history video function on this then runs quite a bit faster, and results in this hi-speed history video:

Running this through the grid-script gives a keyframe display that is both similar and different to the one above:

It is quite a lot quicker to generate, and also gives more information about the motion sequence.

The conclusion is that it is, indeed, possible to make a lot of interesting video visualizations using “only” FFmpeg. Several of these scripts are also much faster than the scripts I have previously used in Matlab and Python. So I will definitely continue to explore FFmpeg, and look at how it can be integrated with the other toolboxes.

NIME publication and performance: Vrengt

My PhD student Cagri Erdem developed a performance together with dancer Katja Henriksen Schia. The piece was first performed together with Qichao Lan and myself during the RITMO opening and also during MusicLab vol. 3. See here for a teaser of the performance:

This week Cagri, Katja and myself performed a version of the piece Vrengt at NIME in Porto Alegre.

We also presented a paper describing the development of the instrument/piece:

Erdem, Cagri, Katja Henriksen Schia, and Alexander Refsum Jensenius. “Vrengt: A Shared Body-Machine Instrument for Music-Dance Performance.” In Proceedings of the International C Onference on New Interfaces for Musical Expression. Porto Alegre, 2019.


This paper describes the process of developing a shared instrument for music–dance performance, with a particular focus on exploring the boundaries between standstill vs motion, and silence vs sound. The piece Vrengt grew from the idea of enabling a true partnership between a musician and a dancer, developing an instrument that would allow for active co-performance. Using a participatory design approach, we worked with sonification as a tool for systematically exploring the dancer’s bodily expressions. The exploration used a “spatiotemporal matrix,” with a particular focus on sonic microinteraction. In the final performance, two Myo armbands were used for capturing muscle activity of the arm and leg of the dancer, together with a wireless headset microphone capturing the sound of breathing. In the paper we reflect on multi-user instrument paradigms, discuss our approach to creating a shared instrument using sonification as a tool for the sound design, and reflect on the performers’ subjective evaluation of the instrument.

New article: Group behaviour and interpersonal synchronization to electronic dance music

I am happy to announce the publication of a follow-up study to our former paper on group dancing to EDM, and a technical paper on motion capture of groups of people. In this new study we successfully managed to track groups of 9-10 people dancing in a semi-ecological setup in our motion capture lab. We also found a lot of interesting things when it came to how people synchronize to both the music and each other.

Solberg, R. T., & Jensenius, A. R. (2017). Group behaviour and interpersonal synchronization to electronic dance music. Musicae Scientiae.

The present study investigates how people move and relate to each other – and to the dance music – in a club-like setting created within a motion capture laboratory. Three groups of participants (29 in total) each danced to a 10-minute-long DJ mix consisting of four tracks of electronic dance music (EDM). Two of the EDM tracks had little structural development, while the two others included a typical “break routine” in the middle of the track, consisting of three distinct passages: (a) “breakdown”, (b) “build-up” and (c) “drop”. The motion capture data show similar bodily responses for all three groups in the break routines: a sudden decrease and increase in the general quantity of motion. More specifically, the participants demonstrated an improved level of interpersonal synchronization after the drop, particularly in their vertical movements. Furthermore, the participants’ activity increased and became more pronounced after the drop. This may suggest that the temporal removal and reintroduction of a clear rhythmic framework, as well as the use of intensifying sound features, have a profound effect on a group’s beat synchronization. Our results further suggest that the musical passages of EDM efficiently lead to the entrainment of a whole group, and that a break routine effectively “re-energizes” the dancing.


New publication: “How still is still? exploring human standstill for artistic applications”

sverm-dumpI am happy to announce a new publication titled How still is still? exploring human standstill for artistic applications (PDF of preprint), published in the International Journal of Arts and Technology. The paper is based on the Sverm project, and was written and accepted two years ago. Sometimes academic publishing takes absurdly long, which this is an example of, but I am happy that the publication is finally out in the wild.


We present the results of a series of observation studies of ourselves standing still on the floor for 10 minutes at a time. The aim has been to understand more about our own standstill, and to develop a heightened sensitivity for micromovements and how they can be used in music and dance performance. The quantity of motion, calculated from motion capture data of a head marker, reveals remarkably similar results for each person, and also between persons. The best results were obtained with the feet at the width of the shoulders, locked knees, and eyes open. No correlation was found between different types of mental strategies employed and the quantity of motion of the head marker, but we still believe that different mental strategies have an important subjective and communicative impact. The findings will be used in the development of a stage performance focused on micromovements.


Jensenius, A. R., Bjerkestrand, K. A. V., and Johnson, V. (2014). How still is still? exploring human standstill for artistic applications. International Journal of Arts and Technology, 7(2/3):207–222.


    Author = {Jensenius, Alexander Refsum and Bjerkestrand, Kari Anne Vadstensvik and Johnson, Victoria},
    Journal = {International Journal of Arts and Technology},
    Number = {2/3},
    Pages = {207--222},
    Title = {How Still is still? Exploring Human Standstill for Artistic Applications},
    Volume = {7},
    Year = {2014}}

Laser dance

Working with choreographer Mia Habib, I created the piece Laser Dance, which was shown on 30 November 1 December 2001 at the Norwegian Academy of Ballet and Dance in Oslo.

The theme of the piece was “Light”, and the choreographer wanted to use direct light sources as the point of departure for the interaction. Mia had decided to work with laser beams, one along the backside of the stage and one on the diagonal, facing towards the audience. The idea was to get sound when the dancers went through the laser beams. This way the sound would be an aural representation of the “broken” light. My part was to help with the lasers and the interactive sound.

First of all, we needed to get some lasers. We tested with a normal laser pen, but the thin beam was merely invisible on stage. Luckily, I happened to have access to a professional laser used for physics experiments. With some smoke on stage, this laser was bright and clear. Since we only had one good laser, we used a narrowed down spotlight as the light beam in the back of the stage.

It soon became clear that the visual beams were not good for motion detection. Henrik Sundt atNoTAM suggested using pairs of IR-senders/receivers instead. When the sender and receiver are in contact with each other nothing happens, but as soon as the signal is broken the receiver sends a pulse. The sensors were quite cheap consumer electronics and when we tested the equipment at NoTAM we found that the reaction time was somewhat slow. This triggered some nerves because the whole project would not be much worth it if the sensors were not capable of detecting fast movements from the dancers. Finally getting everything set up on stage we were happy to find that they worked quite well. A minor problem was the latency of the receiver to regain contact with the sender after moving out of the beam. Anyway, the fact that the sound turned on almost immediately was more important than a second of sound coming also after passing the beam.

Øyvind Hammer and Henrik Sundt helped with getting the NoTAM MIDI controller to work with the sensors. The chords from the IR-receivers were connected to the inputs of the controller. When a signal was detected from one of the receivers, the controller sent a MIDI message with the corresponding channel (1 or 2) and a note-off message. This signal went into MAX/MSP, and we were finally ready to start experimenting with the sound.


From the beginning, Mia knew that she wanted to have two bright and different sounds. On one side they should be almost as pure as a sine tone, as a resemblance to the bright light beams. On the other side, they needed to be “living” and interesting for the dancers to improvise with while in the beam. Since also a side-drum would be involved, the sounds should aspire to blend in with this sound.

This way of musical thinking was novel to me. I am used to working with sounds as self-contained aesthetic objects, and often from a linear perspective. Here I was presented with the task of making what could be called “hypertext” sound with clear limitations. The sounds themselves would constitute only a part of the whole performance and would be more as a tool for the dancers to work with, than aesthetic pleasing. After having made a couple of samples, we sat down tweaking the parameters in the patch. Finally, we came up with two sounds that we both found to be pleasing, interesting and meeting the initial requirements.


When I started building the patch, it was with the idea that it should be self-contained and easy to use. At the same time, it should be powerful enough to provide for doing rapid changes in the sound. This way Mia could operate the whole system herself for the rehearsals with the dancers. The final patch is seen in the picture above. Two boxes labelled ”Laser 1” and ”Laser 2” are turned on and off by the IR sensors. They can also be used on-screen for rehearsals and testing. The gate switch on ”Laser 2” was used at the end of the show when only this sound should be turned off. Frequencies and random generators can be turned on and off and adjusted easily. As shown in the picture on the next side, the patch reveals a hidden world of patch chords and objects, resulting from constant alterations. In the following I will briefly go through the various parts of the patch:

  1. We recall that the MIDI controller sent note-offs when the IR-signal was broken. As can be seen from the patch, MIDI is being picked up by a ”notein”. First, the two different signals are separated by the ”select” command. Then an if-test checkswhether the beam is on (broken) or off (unbroken). For “on” the volume is set to 50 and else it is set to 0.
  2. Each sound is made from a set of three added sinusoids represented with the ”cycle~” object. All of the frequencies can be adjusted directly on-screen.
  3. The ”rand~” object is used to make the sound more vibrant, through a random function. To make this even more “lively”, a random generator controlled by a metronome is changing the ”rand~” seed at given intervals.
  4. A chaos slider at the left was added to increase the tension in both sounds, controlled by an expression pedal.5
  5. A “reset” button is available for quickly regaining the preferred settings.


It was less than a week before the premiere that we finally managed to get everything together. After working with gathering the equipment and making the patch, it was very exciting to test everything in practice and see how it worked with the dancers and the drummer.

Three dancers (two male, one female) were on stage, “trapped” within the light beams. During the 15 minutes long performance, they worked their way from silence to sound climax to silence. Spotlights were used to create massive, cascading effects in between total darkness. The interactive sounds were used partly in short sequences with the dancers jumping in and out of the beams, partly in some longer sequences where the dancers moved “inside” the beams. The two short videos on the CD give some idea of the show.

Laserdance video excerpt

Luckily, the Mac did not crash and the sound turned on and off as it should, every day. Except for that, we did not encounter any other problems than repositioning the sensors to work every night, because the stage had to be cleared after each performance.


After working with this project, it is tempting to draw a parallel to Egil Haga’s (1999) Master’s thesis about sounds and actions. Using the term synchronicity he refers to the concept when a physical action and a sound gesture is believed to be generated from the same action. He points out that most films contain soundtracks generated in studios, and he is critical of examples where the synchronicity is imprecise and poor. Further, he mentions the fact that cross-modal perception results in greater stimuli. For example, it is easier to understand a person talking if you can see the lip movement.

In his last chapter, Haga analyses a dance-music performance where he points out the excellent synchronicity between the music and the gestures used by the dancers. Somehow, after working with the Laser Dance project I got a much better understanding of his comments. Even though he is talking about classical music and dance movements based on the sound gestures, I see a clear line to the interactive sound I have been working with. The difference is mostly that in the Laser Dance project, a sound is more a tool for the dancers to work with, than a musical piece.

Interactive music controlled by sensors really makes the dancers in control of the whole performance. Through body gestures, they control the sound themselves making the synchronicity excellent (at least with efficient sensors…) and the result is a performance where the dance and sound blend together in harmony. This is not always the case when looking at other dance performances with “linear” music. No doubt, this project has opened my eyes to a totally new way of thinking about music, and I have also got a lot of ideas for new projects involving dance and interactive sound.