For the book launch, Cagri and I recorded a short video teaser:
The more formal abstract is:
The topic of gesture has received growing attention among music researchers over recent decades. Some of this research has been summarized in anthologies on “musical gestures”, such as those by Gritten and King (2006), Godøy and Leman (2010), and Gritten and King (2011). There have also been a couple of articles reviewing how the term gesture has been used in various music-related disciplines (and beyond), including those by Cadoz and Wanderley (2000) and Jensenius et al. (2010). Much empirical work has been performed since these reviews were written, aided by better motion capture technologies, new machine learning techniques, and a heightened awareness of the topic. Still there are a number of open questions as to the role of gestures in music performance in general, and in ensemble performance in particular. This chapter aims to clarify some of the basic terminology of music-related body motion, and draw up some perspectives of how one can think about gestures in ensemble performance. This is, obviously, only one way of looking at the very multifaceted concept of gesture, but it may lead to further interest in this exciting and complex research domain.
Ten years after Musical Gestures
We began writing this ensemble gesture chapter in 2020, about ten years after the publication of the chapter Musical gestures: Concepts and methods in research. That musical gesture chapter has, to my surprise, become my most-cited publication to date. When I began working on the topic of musical gestures with Rolf Inge Godøy back in the early 2000s, it was still a relatively new topic. Most music researchers I spoke to didn’t understand why we were interested in the body.
Fast forward to today, and it is hard to find music researchers that are not interested in the body in one way or another. So I am thrilled about the possibility of expanding some of the “old” thoughts about musical gestures into the ensemble context in the newly published book chapter.
The goal of this session is to share, discuss, and appraise the topic of evaluation in the context of SMC research and development. Evaluation is a cornerstone of every scientific research domain, but is a complex subject in our context due to the interdisciplinary nature of SMC coupled with the subjectivity involved in assessing creative endeavours. As SMC research proliferates across the world, the relevance of robust, rigorous empirical evaluation is ever-increasing in the academic and industrial realms. The session will begin with presentations from representatives of NordicSMC member universities, followed by a more free-flowing discussion among these panel members, followed by audience involvement.
The discussion was moderated by Sofia Dahl (Aalborg University) and consisted of Nashmin Yeganeh (University of Iceland), Razvan Paisa (Aalborg University), and Roberto Bresin (KTH).
The challenge of interdisciplinarity
Everyone in the panel agreed that rigorous evaluation is important. The challenge is to figure out what type(s) of evaluation is useful and plausible within sound and music computing research. This was efficiently illustrated in a list of the different methods that are employed by the researchers at KTH.
Roberto Bresin had divided the KTH list into methods that they have been working with for decades (in red) and newer methods that they are currently exploring. The challenge is that each of these methods requires different knowledge and skills, and they all have different types of evaluation.
Although we have a slightly different research profile at UiO than at KTH, we also have a breadth of methodological approaches in SMC-related research. I pointed to a model I often use to explain what we are doing:
The model has two axes. One shows a continuum between artistic and scientific research methods and outputs. Another is a continuum between performing research on natural and cultural phenomena. In addition, we develop and use various types of technologies for all of these.
The reason I like to bring up this model is to explain that things are connected. I often hear that artistic and scientific research are completely different things. Sure, they are different, but there are also commonalities. Similarly, there is an often unnecessary divide between the humanities and the social and natural sciences. True, they have different foci but when studying music we need to take all of these into account. Music involves everything from “low-level” sensory phenomena to “high-level” emotional responses. One can focus on one or the other, but if we really want to understand musical experiences – or make new ones for that matter – we need to see the whole picture. Thus, evaluations of whatever we do also need to have a holistic approach.
Open Research as a Tool for Rigorous Evaluation
My entry into the panel discussion was that we should use the ongoing transition to Open Research practices as an opportunity to also perform more rigorous evaluations. I have previously argued why I believe open research is better research. The main argument is that sharing things (methods, code, data, publications, etc.) openly forces researchers to document everything better. Nobody wants to make sloppy things publicly available. So the process of making all the different parts of the open research puzzle openly available is a critical component of a rigorous evaluation.
In the process of making everything open, we realize, for example, that we need better tools and systems. We also experience that we need to think more carefully about privacy and copyright. That is also part of the evaluation process and lays the ground for other researchers to scrutinize what we are doing.
One of the challenges of discussing rigorous evaluation in the “field” of sound and music computing is that we are not talking about one discipline with one method. Instead, we are talking about a set of approaches to developing and using computational methods for sound and music signals and experiences. If you need to read that sentence a couple of times, it is understandable. Yes, we are combining a lot of different things. And, yes, we are coming from different backgrounds: the arts and humanities, the social and natural sciences, and engineering. That is exactly what is cool about this community. But it is also why it is challenging to agree on what a rigorous evaluation should be!
For the last couple of days, I have participated in the NordicSMC conference. It was organized by a team of Ph.D. fellows from Aalborg University Copenhagen, supported by the Nordic Sound and Music Computing network. UiO is happy to be a partner in this network, together with colleagues in Copenhagen (AAU), Stockholm (KTH), Helsinki (Aalto), and Reykjavik (UoI).
Choosing a conference format
When we began discussing the conference earlier this year, it quickly became apparent that it was unrealistic to meet in person. Restrictions have been lifted in most Nordic countries, but the pandemic is still ongoing. One option could have been to run it as an online-only conference. But since we are an adventurous group of researchers, we wanted to explore running it as a hub-based hybrid conference. Hybrid means that it was run both in-person and online. Hub-based means that there were multiple physical locations. This makes sense, given that there are five partners in the network.
Over the last years, we have gained much experience running hybrid events, such as a hybrid disputation and a hybrid conference. Also, most RITMO events have been hybrid over the last years. We also have much experience with hub-based teaching in the MCT master’s programme, with daily teaching between Oslo and Trondheim for three years. All of these experiences have made us aware of all the technical details that need to be in place for successfully mastering the different “formats.” But also the social aspects of handling various constellations of people in local and remote locations.
NordicSMC2021 was our first attempt at running a multi-hub conference. The idea is old, and several attempts have been running hub-based conferences over the last few years. For example, Richard Parncutt has made interesting reflections on running ESCOM/ICMPC conferences in hubs on different continents. Then timezone issues may be the biggest challenge for a successful event.
In comparison, NordicSMC is a small conference without timezone problems. That also makes it easier to experiment with the setup. Here are some thoughts on what we did and how it worked.
Benefits of hub-based conferences
The easiest would, of course, have been to run an online-only conference using the Zoom Webinar functionality. Most people know this format well, and it is technically easy to set up and run smoothly. I say easy, but you still need to pay attention to many details to run such events well.
However, many of us are tired of sitting alone in our offices, particularly when we can meet on campus. The nice thing about a hub-based conference is that people can meet in various interconnected hubs. This means that you get some of the social aspects of being together in your local hub while at the same time taking part in something larger.
Setting up for a hub-based conference is not particularly difficult. Most places have decent audiovisual equipment these days. Still, there are many ways of making things work well.
The camera choice and placement is an essential element of such a setup. We used a seminar room with a Crestron UC-SB1-CAM system containing a Huddly camera placed below the TV. That is a very wide-angle camera, which has the benefit of capturing the whole space. The downside is that people are tiny in the image. I tested setting up a Logitech web camera on top of the screen instead. I think the image quality was better there, but the bird’s eye view didn’t work too well. So we decided to use the Huddly camera.
One nice option in the Huddly camera is that it can automatically “zoom” and “pan” in on people in the room. This is a software feature based on some computer vision algorithm running in the background. That only partly worked, though. So we ended up having to move the image around manually. Not ideal; we should try to figure out why the auto-tracking didn’t work correctly.
Another option could have been to use one wide-angle camera (like the Huddly) to capture the whole room and a second camera for close-ups of people talking. We have taken this approach for large-scale hybrid events, but it requires more equipment and a small production team. We wanted to explore what can quickly be done in a typical seminar room with video conferencing equipment.
The video setup also requires attention to the physical organization in the room. From the MCT programme we have experienced that sitting in a V-shape work well from a communicative point of view. Such a setup allows the local participants to see each other while at the same time seeing the screen. The aim is to create a sense of both being together locally and remotely at the same time.
Being a conference on sound and music, we care about the sound quality. We know from other activities that the Crestron sound system below the TV is ok for short meetings. However, that system sounds like, eh, a video conferencing sound system. Given the size of the speaker elements, it sounds “thin” and does not project music very well. I struggle sitting through hours of meetings with the semi-poor audio quality found in regular video conferencing systems. Therefore, we decided to use the B&W speaker setup in the room instead. Of course, a good sound system doesn’t save people’s bad microphone quality. But it makes it a joy to listen to those that care about their sound, such as the excellent keynote lecture by Ludvig Elblaus. He also used the stereo sound functionality in Zoom (usually the sound is mono only) for his sound examples, which came through beautifully.
One of the benefits of using a combined video conferencing solution is that it typically has sound feedback cancellation built-in. This saves a lot of trouble when it comes to handling unwanted feedback issues. So when we decided to move to the Hi-Fi sound system in the room, we had to figure out another microphone solution.
We have for some time used a Catchbox microphone in meetings at RITMO. It is a wireless microphone embedded in a softbox that can be thrown around. And the best thing is that it has some very smart anti-feedback and anti-noise circuitry. While testing the setup, we found that the microphone array located in the Crestron panel could pick up talking in the room. However, when you sit some meters from a microphone, it will sound relatively muffled—having the Catchbox close by results in much better sound quality.
Another nice thing about the Catchbox is that it helps clarify who is talking. Others could see the box, so remote participants could understand who had the microphone even when we didn’t zoom in on people. Having to wait for the microphone is also disciplining. The downside to using such a semi-directional microphone is that it does not capture the sonic ambiance of the room. That is something to explore more later.
The need for 2 screens
We started with only having one connected TV screen in the room. That worked well for presentations, but we quickly realized it was challenging to keep track of the chat and Q&A windows simultaneously. This is also problematic when sitting on your own computer, and I don’t understand why Zoom didn’t solve these multi-sub-window problems a long time ago.
Since we had a second TV on another wall in the room, we connected that one to move the chat and Q&A windows there. Not ideal to have them on separate walls, though. This made me realize that “old-style” video conferencing setups with two TVs may be a better solution for such events.
As for running presentations, our default option nowadays is always to connect and run these from a separate laptop. It is much easier to have a dedicated Zoom machine to handle the communication part. It is less risky from a technical point of view and makes it easier to arrange images in the way you want.
In-person or online chairing
During the conference, we explored different solutions for chairing the sessions. Most chairs were doing it online, which allowed for more easily using their local multi-view setups. However, my colleague Stefano Fasciani decided to chair his session from our hub.
This worked well, I think. The Catchbox provided good sound, and the zooming from the Huddly camera made it possible to see him in the image when speaking. This was before we connected the second TV, so he monitored the chat and Q&A through a Zoom session running on his own laptop.
Also, from a conceptual point of view, I think it was nice to have the session chair in the room. It felt more like a “normal” conference. For that reason, I also decided to be present in the hub for the final panel discussion.
As always, juggling multiple platforms is a challenge for such events. There was audiovisual communication happening in Zoom, together with the chat and Q&A windows. Then we used Discord as a social channel in between. I didn’t want to keep a separate Zoom window run on my laptop for the entire two days, which meant that I couldn’t easily follow or contribute in the chat and Q&A windows. For such events, I think there should be an option to turn off incoming video in Zoom. It feels like a waste of bandwidth that everyone in a room should run separate Zoom instances to communicate in text.
Many conferences use Slack for in-between discussions. This time we used Discord, which had the same functionality. Still, it is a challenge to figure out where and how to interact. In online-only events, written communication in various channels has worked well. But when we were trying to create a hub-based setup, it was more challenging. When you sit next to people, you naturally talk to them instead of sending a message.
When it came to the Q&A sessions, I realized that I ended up asking questions aurally rather than in writing. This was because I didn’t have Zoom running on my laptop, and I had access to a microphone. Of course, the ability to ask questions live was limited to those of us in the hubs. The online participants had to interact through the written channels.
For such a small conference, we could have used a regular Zoom meeting instead of a Webinar. That would have allowed everyone to show their face and talk. However, there are risks involved in having relatively large Zoom meetings, so it is much safer on the organizing side to run a Webinar. I generally think that running a Webinar with always-on hub rooms worked well. That gave the presenters the feel of someone being present. And the local hub hosts were in control of the technical and social communication.
It would be interesting to hear how such an asymmetric communication form was experienced by those attending remotely. They, obviously, got a very different experience than those of us present in the hubs. I would imagine that they didn’t feel as connected as those in the hubs, but this may also be what they preferred. One could say that the ability to interact more directly could be a motivating factor to join a hub (for those who can and want to).
Hub-based conferences are the future
All in all, I think this year’s NordicSMC conference was a great success. Many engaging presentations showed the breadth of activities in sound and music computing in our region.
Technically speaking, I also think things went well. The AAU hosts ran things smoothly. The conference was built on what has become the “normal” setup: a Zoom webinar with pre-recorded presentations, panel-based Q&A, and written communication through both Zoom and Discord.
What was new was that we tried out a hub-based approach. We ended up only having hubs in Oslo, Stockholm, and Copenhagen, and we all set them up slightly differently. Still, we managed to create a sense of being “together”. We weren’t that many people in the Oslo hub, but people came in and out during the conference. As such, it felt like being at a regular conference.
The experience was different than if I had been sitting alone in my office. I noticed that I followed the presentations more carefully than I would have otherwise done. And it was great chatting with colleagues and students during the breaks.
There are lots of small details that can be improved, both technically and conceptually. On the technical side, I think paying attention to the camera and microphone placements are critical factors. Often the equipment is ok, but it is set up and used sub-optimally. One challenge is that you are often not in complete control of the systems in university seminar rooms. For example, we did not have admin privileges on the Zoom computer, making it difficult to reach some settings.
I also think it is essential to pay attention to social interaction. One often says that running a hybrid event is more complex than running in-person or online-only. That is because you need to think about two different social groups at once. In that respect, running a hub-based and hybrid conference is triple difficult. You need to cater to the well-being of the people in all the hubs and online at the same time. The best solution here is to assign “hub hosts” responsible for the social interaction in their own hubs. It is also vital that these hub hosts interact with each other. If done well, I think this can make such hub-based events successful. They would not be the same as in-person events, but they can capture the feeling of being together.
After nearly three years of planning, we can finally welcome people to MusicLab Copenhagen. This is a unique “science concert” involving the Danish String Quartet, one of the world’s leading classical ensembles. Tonight, they will perform pieces by Bach, Beethoven, Schnittke and folk music in a normal concert setting at Musikhuset in Copenhagen. However, the concert is nothing but normal.
Live music research
During the concert, about twenty researchers from RITMO and partner institutions will conduct investigations and experiments informed by phenomenology, music psychology, complex systems analysis, and music technology. The aim is to answer some big research questions, like:
What is musical complexity?
What is the relation between musical absorption and empathy?
Is there such a thing as a shared zone of absorption, and is it measurable?
How can musical texture be rendered visually?
The concert will be live-streamed (on YouTube and Facebook) and it will also be aired on Danish radio. There will also be a short film documenting the whole process.
Real-world Open Research
This concert will be the biggest and most complex MusicLab event to date. Still, all the normal “ingredients” of a MusicLab will be in place. The core is a spectacular performance. We will capture a lot of data using state-of-the-art technologies, but in a way that is as little obtrusive as possible for performers and the audience. After the concert, both performers and researchers will talk about the experience.
Of course, being a flagship Open Research project, all the collected data will be shared openly. The researchers will show glimpses of data processing procedures as part of the “data jockeying” at the end of the event. However, it is first when all data is properly uploaded and pre-processed that data processing can start. All the involved researchers will dig into their respective data. But since everything is openly available, anyone can go in and work on the data as they wish.
Due to the corona situation, the event has been postponed several times. That has been unfortunate and stressful for everyone involved. On the positive side, it has also meant that we have been able to rehearse and prepare very well. Already a year ago we ran a full rehearsal of the technical setup of the concert. We even live-streamed the whole preparation event, in the spirit of “slow TV”:
I am quite confident that things will run smooth during the concert. Of course, there are always obstacles. For example, one of our eye-trackers broke in one of the last tests. And it is always exciting to wait for Apple and Google to approve updates of our MusicLab app in their respective app stores.
Or, more specifically: can AI replace an artist? That is the question posed in a short documentary that I have contributed to for this year’s Research Days.
We were contacted before summer about trying to create a new song based on the catalogue of the Norwegian artist Ary. The idea was to use machine learning to generate the song. This has turned out to be an exciting project.
The project started with Ary sending us a bunch of her lyrics in text format and dumps from a digital audio workstation. This material was not really machine-readable/listenable, so Lars had to spend a great deal of time manually structuring and annotating it into a symbolic data set that could be used for training. The machine-learning part of the project involved generating lyrics following this approach, using an AI model that can already speak English. The melody was generated using Bachprop, based on a deep recurrent neural network. Then Lars put it all together into a final soundtrack that we played for Ary.
When talking about AI, I always find it important to highlight that humans are important for the final result. Yes, the machine makes something, but not without a lot of human guidance. The song played in the video (and the many other songs we also generated) was “composed” by a computer. However, Lars made many important decisions throughout the project: the initial preparation of the training material, the models used, the methods used, all the settings used, the selection of which lyrics and melodies to choose, and the final putting together of everything.
Last year, I was involved in several discussions about the potential challenges of using AI in music-making. I also wrote a blog post about some of the possibilities. While in theory AI can be used autonomously I believe that the most interesting is the meeting point between humans and machines. Providing artists like Ary with AI-based technologies can lead to exciting new music!