Music and AI

Last week I was interviewed about music and artificial intelligence (AI). This led to several different stories on radio, TV, and as text. The reason for the sudden media interest in this topic was a story by The Guardian on the use of deep learning for creating music. They featured an example of the creation of Sinatra-inspired music made using a deep learning algorithm:

After these stories were published, I was asked about participating in a talk-show on Friday evening. I said, yes, of course. After all, music technology research rarely hits the news here in Norway. Unfortunately, my participation in the talk-show was cancelled on short notice after one of the other participants got ill. I had already written up some notes on what I would have liked to convey to the general public about music and AI, so I thought that I could at least publish them here on the blog.

The use of AI in music is not new

While deep learning is the current state-of-the-art, AI has been used in music-making for decades. Previously this was mainly done on symbolic music representations. That is, a computer algorithm is fed with some musical scores, and it tries to create, for example, new melodies. There are lots of examples of this in the computer music literature, such as in the proceedings of the International Computer Music Conference.

The Sinatra-example shows that it is now possible to create convincing music also at the sample level. This is neatly done, but also timbre-based AI has been around for a while. This was actually something I was very excited about during my master’s thesis around 20 years ago. Inspired by David Wessel at UC Berkeley, I trained a set of artificial neural networks with saxophone sounds. Much has happened since then, but the basic principles are the same. You feed a computer with real sound and asks it to come up with something new that is somewhat similar. We have several projects that explore this in the Interaction and Robotics cluster at RITMO.

If one also considers algorithmic music as a type of AI, this has been around for centuries. Here the idea is to create music by formulating an algorithm that can make musical choices. There are rumours of Mozart’s dice-based procedural music in the 18th century, and I am sure that others also thought about this (does anyone know of a good overview of pre-20th-century algorithmic music?).

There will be changes in the music industry

As the news stories this week showed, many people in the music industry are scared about what is happening with the introduction of more AI in music. And, yes, things are quite surely going to change. But this is, again, nothing new. The music industry has always been in change. The development of the professional concert halls in the 19th century was a major change, and the recording industry changed music forever in the 20th century. That doesn’t necessarily mean that everyone else will lose their jobs. Even though everyone can listen to music everywhere these days, many people still enjoy going to concerts to experience live music (and even more so now when corona has deprived us of the possibility).

Sound engineers, music producers, and other people involved in the production of recorded music are examples of new professions that emerged with the new music industry in the 20th century. AI will lead to new music jobs being created in the 21st century. We have already seen that streaming has changed the music industry radically. Although most people don’t think much about it in daily life, there are lots of algorithms and AI involved under the hood of streaming services.

There will surely be some people in the industry that will lose their jobs, but many new jobs will also be created. Music technologists with AI competency will be a sought after competency. This is something we have known for a long time, and that is why we are teaching courses on music and machine learning and interactive music systems in the Music, Communication and Technology master’s programme at UiO.

We need to rethink copyright and licensing

Another topic that has emerged from the discussions this week is how to handle copyright and licensing. And, yes, AI challenges the current systems. I would argue, however, that AI is not the problem as such, but it exposes some basic flaws in the current system. Today’s copyright system is based around the song as the primary copyright “unit”. This probably made sense in a pre-technological age where composers wrote a song that musicians performed. Things are quite different in a technology-driven world, where music comes in many different forms and formats.

The Sinatra-example can be seen as a type of sampling, just at a microscopic scale. Composers have “borrowed” from others throughout the entire music history. Usually, this was in the form of melody, harmony, and rhythm. Producers have for decades used sound fragments (“samples”) of others, which has led to lots of interesting music and several lawsuits. There are numerous challenges here, and we actually have two ongoing research projects at UiO that explore the usage of sampling and copyright in different ways (MASHED and MUSEC).

What is new now, is that AI makes it possible to also “sample” at sample-level, that is, the smallest part of a sound wave. If you don’t know what that means, take a look at the zoomed-in version of a complex sound wave like this:

Zoom in and out of a complex waveform (coded by Jan Van Balen).

Splicing together sound samples with AI opens for some interesting technological, philosophical and legal questions. For example, how short samples can be covered by copyright? What types of representations of such samples should be considered? Waveforms? Equations? A software with API? Clearly, there is a need to think more carefully about how this should be implemented.

Possibilities with music and AI

The above-mentioned challenges (a changing music industry and copyright discussions) are not trivial, and I understand that many people are scared by the current changes. However, this fright of new technology may get in the way of many of the positive aspects of using AI in music.

It should be mentioned that many of the new methods have been developed and explored by composers, producers, and other types of music technologists. The intention has been to use machine learning, evolutionary algorithms, and other types of AI to generate new sound and music that would not otherwise be possible. There are some extreme cases of completely computer-generated music. Check out, for example, the autonomous instruments by my former colleague Risto Holopainen. In most cases, however, AI has been (and is still) used as part of a composition/production process together with humans.

Personally, I am particularly interested in how AI can help to create better interactive music systems. Over the last century, we have seen a shift towards music becoming a “hermetic” product. Up until the 20th century, music was never fixed, it changed according to who played. To experience music, you either had to play yourself or be in the near vicinity of someone else that played. Nowadays, we listen to recorded music that never changes. This has led to an increased perfection of the final product. At the same time, it has removed many people from the experience of participating in musicking themselves.

New technologies, such as AI, allow for creating more personalized music. The procedural audio found in computer games is one such example of music that can be “stretched” in, for example, its duration. New music apps allow users to modify parts of a song, such as adding or removing instruments to a mix. There are also examples of interactive music systems made for work, relaxation, or training (check, for example, the MCT student blog). They all have in common that they respond to some input from the user and modifies the music accordingly. This allows for a more engaging musical experience than fixed recordings. I am sure we will see lots of more such examples in the future, and they will undoubtedly benefit from better AI models.

Music and AI is the future

AI is coming to music as it is coming to everything else these days. This may seem disruptive to parts of the music industry, but it could also be seen as an opportunity. New technologies will lead to new music and new forms of musicking.

Some believe that computers will take over everything. I am sure that is not the case. What has become very clear from our corona home office lives is that humans are made of flesh and blood, we like to move around, and we are social. The music of the future will continue to be based on our needs to move, to run, to dance, and to engage with others musically. New technologies may even help us to do that better. I am also quite sure that we will continue to enjoy playing music ourselves on acoustic instruments.

Visual effect of the different tblend functions in FFmpeg

FFmpeg is a fantastic resource for doing all sorts of video manipulations from the terminal. However, it has a lot of features, and it is not always easy to understand what they all mean.

I was interested in understanding more about how the tblend function works. This is a function that blends successive frames in 30 different ways. To get a visual understanding of how the different operations work, I decided to try them all out on the same video file. I started from this dance video:

Then I ran this script:

This created 30 video files, each showing the effect of the tblend operator in question. Here is a playlist with all the different resultant videos:

Instead of watching each of them independently, I also wanted to make a grid of all 30 videos. This can be done manually in a video editor, but I wanted to check how it can be done with FFmpeg. I came across this nice blog post with an example that almost matched my needs. With a little bit of tweaking, I came up with this script:

The final result is a 30-video grid with all the different tblend operators placed next to each other (in alphabetical order from top left, I think). Consult the playlist to see the individual videos.

MusicTestLab as a Testbed of Open Research

Many people talk about “opening” the research process these days. Due to initiatives like Plan S, much has happened when it comes to Open Access to research publications. There are also things happening when it comes to sharing data openly (or at least FAIR). Unfortunately, there is currently more talking about Open Research than doing. At RITMO, we are actively exploring different strategies for opening our research. The most extreme case is that of MusicLab. In this blog post, I will reflect on yesterday’s MusicTestLab – Slow TV.

About MusicLab

MusicLab is an innovation project by RITMO and the University Library. The aim is to explore new methods for conducting research, research communication and education. The project is organized around events: a concert in a public venue, which is also the object of study. The events also contain an edutainment element through panel discussions with world-leading researchers and artists, as well as “data jockeying” in the form of live data analysis of recorded data.

We have carried out 5 full MusicLab events so far and a couple of in-between cases. Now we are preparing for a huge event in Copenhagen with the Danish String Quartet. The concert has already been postponed once due to corona, but we hope to make it happen in May next year.

The wildest data collection ever

As part of the preparation for MusicLab Copenhagen, we decided to run a MusicTestLab to see if it is at all possible to carry out the type of data collection that we would like to do. Usually, we work in the fourMs Lab, a custom-built facility with state-of-the-art equipment. This is great for many things, but the goal of MusicLab is to do data collection in the “wild”, which would typically mean a concert venue.

For MusicTestLab, we decided to run the event on the stage in the foyer of the Science Library at UiO, which is a real-world venue that gives us plenty of challenges to work with. We decided to bring a full “package” of equipment, including:

  • infrared motion capture (Optitrack)
  • eye trackers (Pupil Labs)
  • physiological sensors (EMG from Delsys)
  • audio (binaural and ambisonics)
  • video (180° GoPros and 360° Garmin)

We are used to working with all of these systems separately in the lab, but it is more challenging when combining them in an out-of-lab setting, and with time pressure on setting everything up in a fairly short amount of time.

Musicians on stage with many different types of sensors on, with RITMO researchers running the data collection and a team from LINK filming.

Streaming live – Slow TV

In addition to actually doing the data collection in a public venue, where people passing by can see what is going on, we decided to also stream the entire setup online. This may seem strange, but we have found that many people are actually interested in what we are doing. Many people also ask about how we do things, and this was a good opportunity to show people the behind-the-scenes of a very complex data collection process. The recording of the stream is available online:

To make it a little more watcher-friendly, the stream features live commentary by myself and Solveig Sørbø from the library. We talk about what is going on and make interviews with the researchers and musicians. As can be seen from the stream, it was a quite hectic event, which was further complicated by corona restrictions. We were about an hour late for the first performance, but we managed to complete the whole recording session within the allocated time frame.

The performances

The point of the MusicLab events is to study live music, and this was also the focal point of the MusicTestLab, featuring the very nice, young student-led Borealis String Quartet. They performed two movements of Haydn’s Op. 76, no. 4 «Sunrise» quartet. The first performance can be seen here (with a close-up of the motion capture markers):

The first performance of Haydn’s string quartet Op. 76, no. 4 (movements I and II) by the Borealis String Quartet.

Then after the first performance, the musicians took off the sensors and glasses, had a short break, and then put everything back on again. The point of this was for the researchers to get more experience with putting everything on properly. From a data collection point of view, it is also interesting to see how reliable the data are between different recordings. The second performance can be seen here, now with a projection of the gaze from the violist’s eye-tracking glasses:

The second performance of Haydn’s string quartet Op. 76, no. 4 (movements I and II) by the Borealis String Quartet.

A successful learning experience

The most important conclusion of the day was that it is, indeed, possible to carry out such a large and complex data collection in an out-of-lab setting. It took an hour longer than expected to set everything up, but it also took an hour less to take everything down. This is valuable information for later. We also learned a lot about what types of clamps, brackets, cables, etc., that are needed for such events. Also useful is the experience of calibrating all the equipment in a new and uncontrolled environment. All in all, the experience will help us in making better data collections in the future.

Sharing with the world

Why is it interesting to share all of this with the world? RITMO is a Norwegian Centre of Excellence, which means that we get a substantial amount of funding for doing cutting-edge research. We are also in a unique position to have a very interdisciplinary team of researchers, with broad methodological expertise. With the trust we have received from UiO and our many funding agencies, we, therefore, feel an obligation to share as much as possible of our knowledge and expertise with the world. Of course, we present our findings at the major conferences and publish our final results in leading journals. But we also believe that sharing the way we work can help others.

Sharing our internal research process with the world is also a way of improving our own way of working. Having to explain what you do to others help to sharpen your own thinking. I believe that this will again lead to better research. We cannot run MusicTestLabs every day. Today all the researchers will copy all the files that we recorded yesterday and start on the laborious post-processing of all the material. Then we can start on the analysis, which may eventually lead to a publication in a year (or two or three) from now. If we do end up with a publication (or more) based on this material, everyone will be able to see how it was collected and be able to follow the data processing through all its chains. That is our approach to doing research that is verifiable by our peers. And, if it turns out that we messed something up, and that the data cannot be used for anything, we have still learned a lot through the process. In fact, we even have a recording of the whole data collection process so that we can go back and see what happened.

Other researchers need to come up with their approaches to opening their research. MusicLab is our testbed. As can be seen from the video, it is hectic. Most importantly, though, is that it is fun!

RITMO researchers transporting equipment to MusicTestLab in the beautiful October weather.

Motiongrams of rhythmic chimpanzee swaying

I came across a very interesting study on the Rhythmic swaying induced by sound in chimpanzees. The authors have shared the videos recorded in the study (Open Research is great!), so I was eager to try out some analyses with the Musical Gestures Toolbox for Matlab.

Here is an example of one of the videos from the collection:

The video quality is not very good, so I had my doubts about what I could find. It is particularly challenging that the camera is moving slightly over time. There is also a part where the camera zooms in towards the end. A good rule of thumb is to always use a tripod and no zoom/pan/tilt when recording video for analysis.

Still, I managed to create a couple of interesting visualizations. Here I include two motiongrams, one horizontal and one vertical:

This horizontal motiongram shows the sideways motion of the monkey. Time runs from left to right.
This vertical motiongram reveals the sideways motion of the monkey. Time runs from top to bottom.

Despite the poor input quality, I was happy to see that the motiongrams are quite illustrative of what we see in the video. They clearly reveal the rhythmic pattern of the monkey’s motion. It would have been interesting to have some longer recordings to do some more detailed analysis of correspondences between sound and motion!

If you are interested in making such visualisations yourself, have a look at our collection of tools in the Musical Gestures Toolbox.

Embed YouTube video with subtitles in different languages

This is primarily a note to self post, but could hopefully also be useful for others. At least, I spent a little too long to figure out to embed a YouTube video with a specific language on the subtitles.

The starting point is that I had this project video that I wanted to embed on a project website:

However, then I found that you can add info about the specific language you want to use by adding this snippet after the URL:


This means ?hl=en is the language of the controls, &cc_lang_pref=en is the language of the subtitles and &cc=1 turns on the subtitles. The complete block is:

<iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="315" src="" width="560"></iframe>

And the embedded video looks like this:

To play the same video with Norwegian subtitles on the Norwegian web page, I use this block:

<iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="315" src="" width="560"></iframe>

And this looks like:

Simple when you have found the solution!