Flamenco video analysis

I continue my testing of the new Musical Gestures Toolbox for Python. One thing is to use the toolbox on controlled recordings with stationary cameras and non-moving backgrounds (see examples of visualizations of AIST videos). But it is also interesting to explore “real world” videos (such as the Bergensbanen train journey).

I came across a great video of flamenco dancer Selene Muñoz, and wondered how I could visualize what is going on there:

Videograms and motiongrams

My first idea is always to create a motiongram to get an overview of what goes on in the video file. Here we can clearly see the structure of the recording:

A motiongram of the video flamenco dance recording (6 minutes 23 seconds).

The motiongram shows what changes in the frames. The challenge with analyzing such TV production recordings is that there is a lot of camera movement. This can be more clearly seen in a videogram (the same technique as motiongram, but calculated from the regular image).

A videogram of the same recording. Three segments of camera zoom/pan are highlighted.

Sometimes, the videogram can be useful, but the motiongram can more clearly show the motion happening in the files. Since it is based on frame differencing, it effectually “removes” the background material. So by zooming into a motiongram, it is possible to see more details of the motion.

A horizontal motiongram of the opening sequence. This shows the combined motion of both performers and is therefore not particularly useful.

The above illustration shows a horizontal motiongram, which reflects the vertical motion (yes, it is a bit confusing with this horizontal/vertical thinking…). When there are two performers, that is not particularly useful. In such cases, I prefer to look at the vertical motiongram instead, which shows the horizontal motion. Then it is much easier to see the motion of each performer separately, not least their turn-taking in the performance.

The vertical motiongram can be used to investigate the motion of each performer.

The motiongram can also be used together with audio representations, such as the tempogram shown below.

A motiongram (top) and tempogram (bottom) can be used to look at structural similarities and differences between the audio and video streams (the tempo estimation here is not particularly relevant, but it is part of the script).

Grid representations

The above visualizations show information about continuous motion. I often find this to be useful when studying, well, motion. However, when dealing with multi-camera productions, it is often common to look at grid-based image displays instead. Using one of the functions from the MGT for Terminal, I created some versions with an increasing number of extracted frames.

There is a trade-off here between getting a general overview and getting into the details. I think that the 3×3 and 4×4 versions manage to capture the main content of the recording fairly well.

Visualizations always need to be targeted at what one wants to show. Often, it may be the combination of different plots that is most useful. For example, a grid display may be used together with motiongrams and a waveform of the audio.

A grid display (top), motiongram (middle), and audio waveform (bottom) reveal quite a bit of the content of the video recording.

Now I put these displays together manually. The aim is to generate such combined plots directly from MGT for Python.

Kayaking motion analysis

Like many others, I bought a kayak during the pandemic, and I have had many nice trips in the Oslo fiord over the last year. Working at RITMO, I think a lot about rhythm these days, and the rhythmic nature of kayaking made me curious to investigate the pattern a little more.

Capturing kayaking motion

My spontaneous investigations into kayak motion began with simply recording a short video of myself kayaking. This was done by placing an action camera (a GoPro Hero 8, to be precise) on my life vest. The result looks like this:

In the future, it would be interesting to also test with a proper motion capture system (see this article for an overview of different approaches). However, as they say, the best motion capture system is the one you have at hand, and cameras are by far the easiest one to bring around.

Analysing kayaking motion

For the analysis, I reached for the Musical Gestures Toolbox for Python. It has matured nicely over the last year and is also where we are putting in most new development efforts these days.

The first step of motion analysis is to generate a motion video:

From the motion video, MGT will also create a motiongram:

Motiongram of a kayaking video.

From the motiongram, it is pretty easy to see the regularity of the kayaking strokes. This may be even easier from the videogram:

Videogram of a kayaking video.

We also get information about the centroid and quantity of motion:

Centroid and quantity of motion of the kayaking video.

The quantity of motion can be used for further statistical analysis. But for now, I am more interested in exploring how it is possible to better visualise the rhythmic properties of the video itself. It was already on the list to implement directograms in MGT, and this is even higher on the list now.

The motion average image (generated from the motion video) does not reveal much about the motion.

Motion average image of the kayaking video.

It is generated by calculating the average of all the frames. What is puzzling is the colour artefacts. I wonder whether that is coming from some compression error in the video or a bug somewhere in MGT for Python. I cannot see the same artefacts in the average image:

Average image of the kayaking video.

Analysing the sound of kayaking

The video recording also has sound, so I was curious to see if this could be used for anything. True, kayaking is a quiet activity, so I didn’t have very high hopes. Also, GoPros don’t have particularly good microphones, and they compress the sound a lot. Still, there could be something in the signal. To begin with, the waveform display of the sound does not tell that much:

A waveform of the sound of kayaking.

The spectrogram does not reveal that much either, although it is interesting to see the effects of the sound compression done by the GoPro (the horizontal lines from 5k and upward).

A spectrogram of the sound of kayaking.

Then the tempogram is more interesting.

A tempogram of the sound of kayaking.

It is exciting to see that it estimates the tempo to be 122 BPM, and this resonates with theories about 120 BPM being the average tempo of moderate human activity.

This little investigation into the sound and video of kayaking made me curious about what else can be found from such recordings. In particular, I will continue to explore approaches to analysing the rhythm of audiovisual recordings. It also made me look forward to a new kayaking season!

New article: Best versus Good Enough Practices for Open Music Research

After a fairly long publication process, I am happy to finally announce a new paper: Best versus Good Enough Practices for Open Music Research in Empirical Musicology Review.

Summary

The abstract reads:

Music researchers work with increasingly large and complex data sets. There are few established data handling practices in the field and several conceptual, technological, and practical challenges. Furthermore, many music researchers are not equipped for (or interested in) the craft of data storage, curation, and archiving. This paper discusses some of the particular challenges that empirical music researchers face when working towards Open Research practices: handling (1) (multi)media files, (2) privacy, and (3) copyright issues. These are exemplified through MusicLab, an event series focused on fostering openness in music research. It is argued that the “best practice” suggested by the FAIR principles is too demanding in many cases, but “good enough practice” may be within reach for many. A four-layer data handling “recipe” is suggested as concrete advice for achieving “good enough practice” in empirical music research.

The article is written based on challenges we have faced with adhering to Open Research principles within music research. I mention our experiences with MusicLab in particular.

The perhaps most important take-home message from the article is my set of recommendations in the end.

1. DATA COLLECTION (“RAW”)

1a. Create analysis-friendly data. Planning what to record will save time afterward, and will probably lead to better results in the long run. Write a data management plan (DMP).

1b. Plan for mistakes. Things will always happen. Ensure redundancy in critical parts of the data collection chain.

1c. Save the raw data. In most cases, the raw data will be processed in different ways, and it may be necessary to go back to the start.

1d. Agree on a naming convention before recording. Cleaning up the names of files and folders after recording can be tedious. Get it right from the start instead. Use unique identifiers for all equipment (camera1, etc.), procedures (pre-questionnaire1, etc.) and participants (a001, etc.).

1e. Make backups of everything as quickly as possible. Losing data is never fun, and particularly not the raw data.

2. DATA PRE-PROCESSING (“PROCESSED”)

2a. Separate raw from processed data. Nothing is as problematic as over-writing the original data in the pre-processing phase. Make the raw data folder read-only once it is organized.

2b. Use open and interoperable file formats. Often the raw data will be based on closed or proprietary formats. The data should be converted to interoperable file formats as early as possible.

2c. Give everything meaningful names. Nothing is as cryptic as 8-character abbreviations that nobody will understand. Document your naming convention.

3. DATA STORAGE (“COOKED”)

3a. Organize files into folders. Creating a nested and hierarchical folder structure with meaningful names is a basic, but system-independent and future-proof solution. Even though search engines and machine learning improve, it helps to have a structured organizational approach in the first place.

3b. Make incremental changes. It may be tempting to save the last processed version of your data, but it may be impossible to go back to make corrections or verify the process.

3c. Record all the steps used to process data. This can be in a text file describing the steps taken. If working with GUI-based software, be careful to note down details about the software version, and possibly include screenshots of settings. If working with scripts, document the scripts carefully, so that others can understand them several years from now. If using a code repository (recommended), store current snapshots of the scripts with the data. This makes it possible to validate the analysis.

4. DATA ARCHIVE (“PRESERVED”)

4a. Always submit data with manuscripts. Publications based on data should be considered incomplete if the data is not accessible in such a way that it is possible to evaluate the analysis and claims in the paper.

4b. Submit the data to a repository. To ensure the long-term preservation of your data, also independently of publications, it should be uploaded to a reputable DOI-issuing repository so that others can access and cite it.

4c. Let people know about the data. Data collection is time-consuming, and in general, most data is under-analyzed. More data should be analyzed more than once.

4d. Put a license on the data. This should ideally be an open and permissive license (such as those suggested by Creative Commons). However, even if using a closed license, it is important to clearly label the data in a way so that others can understand how to use them.

New Book Chapter: Gestures in ensemble performance

I am happy to announce that Cagri Erdem and I have written a chapter titled “Gestures in ensemble performance” in the new book Together in Music: Coordination, Expression, Participation edited by Renee Timmers Freya Bailes, and Helena Daffern.

Video Teaser

For the book launch, Cagri and I recorded a short video teaser:

Abstract

The more formal abstract is:

The topic of gesture has received growing attention among music researchers over recent decades. Some of this research has been summarized in anthologies on “musical gestures”, such as those by Gritten and King (2006), Godøy and Leman (2010), and Gritten and King (2011). There have also been a couple of articles reviewing how the term gesture has been used in various music-related disciplines (and beyond), including those by Cadoz and Wanderley (2000) and Jensenius et al. (2010). Much empirical work has been performed since these reviews were written, aided by better motion capture technologies, new machine learning techniques, and a heightened awareness of the topic. Still there are a number of open questions as to the role of gestures in music performance in general, and in ensemble performance in particular. This chapter aims to clarify some of the basic terminology of music-related body motion, and draw up some perspectives of how one can think about gestures in ensemble performance. This is, obviously, only one way of looking at the very multifaceted concept of gesture, but it may lead to further interest in this exciting and complex research domain.

Ten years after Musical Gestures

We began writing this ensemble gesture chapter in 2020, about ten years after the publication of the chapter Musical gestures: Concepts and methods in research. That musical gesture chapter has, to my surprise, become my most-cited publication to date. When I began working on the topic of musical gestures with Rolf Inge Godøy back in the early 2000s, it was still a relatively new topic. Most music researchers I spoke to didn’t understand why we were interested in the body.

Fast forward to today, and it is hard to find music researchers that are not interested in the body in one way or another. So I am thrilled about the possibility of expanding some of the “old” thoughts about musical gestures into the ensemble context in the newly published book chapter.

Rigorous Empirical Evaluation of Sound and Music Computing Research

At the NordicSMC conference last week, I was part of a panel discussing the topic Rigorous Empirical Evaluation of SMC Research. This was the original description of the session:

The goal of this session is to share, discuss, and appraise the topic of evaluation in the context of SMC research and development. Evaluation is a cornerstone of every scientific research domain, but is a complex subject in our context due to the interdisciplinary nature of SMC coupled with the subjectivity involved in assessing creative endeavours. As SMC research proliferates across the world, the relevance of robust, rigorous empirical evaluation is ever-increasing in the academic and industrial realms. The session will begin with presentations from representatives of NordicSMC member universities, followed by a more free-flowing discussion among these panel members, followed by audience involvement.

The discussion was moderated by Sofia Dahl (Aalborg University) and consisted of Nashmin Yeganeh (University of Iceland), Razvan Paisa (Aalborg University), and Roberto Bresin (KTH).

The challenge of interdisciplinarity

Everyone in the panel agreed that rigorous evaluation is important. The challenge is to figure out what type(s) of evaluation is useful and plausible within sound and music computing research. This was efficiently illustrated in a list of the different methods that are employed by the researchers at KTH.

A list of methods in use by the sound and music computing researchers at KTH.

Roberto Bresin had divided the KTH list into methods that they have been working with for decades (in red) and newer methods that they are currently exploring. The challenge is that each of these methods requires different knowledge and skills, and they all have different types of evaluation.

Although we have a slightly different research profile at UiO than at KTH, we also have a breadth of methodological approaches in SMC-related research. I pointed to a model I often use to explain what we are doing:

A simplified model of explaining my research approach.

The model has two axes. One shows a continuum between artistic and scientific research methods and outputs. Another is a continuum between performing research on natural and cultural phenomena. In addition, we develop and use various types of technologies for all of these.

The reason I like to bring up this model is to explain that things are connected. I often hear that artistic and scientific research are completely different things. Sure, they are different, but there are also commonalities. Similarly, there is an often unnecessary divide between the humanities and the social and natural sciences. True, they have different foci but when studying music we need to take all of these into account. Music involves everything from “low-level” sensory phenomena to “high-level” emotional responses. One can focus on one or the other, but if we really want to understand musical experiences – or make new ones for that matter – we need to see the whole picture. Thus, evaluations of whatever we do also need to have a holistic approach.

Open Research as a Tool for Rigorous Evaluation

My entry into the panel discussion was that we should use the ongoing transition to Open Research practices as an opportunity to also perform more rigorous evaluations. I have previously argued why I believe open research is better research. The main argument is that sharing things (methods, code, data, publications, etc.) openly forces researchers to document everything better. Nobody wants to make sloppy things publicly available. So the process of making all the different parts of the open research puzzle openly available is a critical component of a rigorous evaluation.

In the process of making everything open, we realize, for example, that we need better tools and systems. We also experience that we need to think more carefully about privacy and copyright. That is also part of the evaluation process and lays the ground for other researchers to scrutinize what we are doing.

Summing up

One of the challenges of discussing rigorous evaluation in the “field” of sound and music computing is that we are not talking about one discipline with one method. Instead, we are talking about a set of approaches to developing and using computational methods for sound and music signals and experiences. If you need to read that sentence a couple of times, it is understandable. Yes, we are combining a lot of different things. And, yes, we are coming from different backgrounds: the arts and humanities, the social and natural sciences, and engineering. That is exactly what is cool about this community. But it is also why it is challenging to agree on what a rigorous evaluation should be!