Sound and Music Computing at the University of Oslo

This year’s Sound and Music Computing (SMC) Conference has opened for virtual lab tours. When we cannot travel to visit each other, this is a great way to showcase how things look and what we are working on.

Stefano Fasciani and I teamed up a couple of weeks ago to walk around some of the labs and studios at the Department of Musicology and RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion. We started in the Portal used for the Music, Communication & Technology master’s programme and ended up in the fourMs Lab.

Needless to say, we only scratched the surface of everything going on in the field of sound and music computing at the University of Oslo in this video. The video focused primarily on our infrastructures. We have several ongoing projects that use these studios and labs and also some non-lab-based projects. This include:

And I should not forget to mention our exciting collaboration with partners in Copenhagen, Stockholm, Helsinki, and Reykjavik in the Nordic Sound and Music Computing network.

And, as we end the video, please don’t hesitate to get in touch if you want to visit us or collaborate on projects.

.


Running a successful Zoom Webinar

I have been involved in running some Zoom Webinars over the last year, culminating with the Rhythm Production and Perception Workshop 2021 this week. I have written a general blog post about the production. Here I will write a little more about some lessons learned on running large Zoom Webinars.

In previous Webinars, such as the RITMO Seminars by Rebecca Fiebrink and Sean Gallagher, I ran everything from my office. These were completely online events, based on each person sitting with their own laptop. This is where the Zoom Webinar solution shines.

Things become more complex once you try to run a hybrid event, where some people are online and others are on-site. Then you need to combine methods for video production and streaming with those of a streamlined video conferencing solution. It is doable but quite tricky.

RPPW 2021 chair Anne Danielsen introduces one of the panels from RITMO.

The production of the RPPW Webinar involved four people: myself as “director”, RITMO admin Marit Furunes as “Zoom captain”, and two MCT students, Thomas Anda and Wenbo Yi as “Zoom station managers”. We could probably have done it with three people, but given that other things were going on simultaneously, I am happy we decided to have four people involved. In the following, I will describe each person’s role.

Zoom station 1

The first Zoom station, RITMO 1, was operated by Wenbo Yi. He was sitting behind the lecture desk in the hall, equipped with one desktop PC with two screens. The right screen was split to be shown with a projector on the wall. This was the screen that showed displays when there were breaks, and so on.

MCT student Wenbo Yi in control of RITMO 1 during RPPW.

There are three cameras connected to the PC: one front-facing that we used for the moderator of the keynote presentations, one next to the screen on the wall that showed a close-up of conference chair Anne Danielsen during the introductions, and one in the back of the hall that showed the whole space.

There are four microphones connected to the PC and the PA system in the hall. One on the desk that was used for the keynote moderation. We only used one of the wireless microphones, a handheld one that Anne used during her introductions.

The nice thing about Zoom is that it is easy for a person to turn on and off cameras and microphones. However, this is designed around the concept that you are sitting in front of your computer. When you are standing in the middle of the room, someone else will need to click. That was the job of Wenbo. He switched between cameras and microphones and turned on and off slides.

Zoom station 2

The second Zoom station, RITMO 2, was operated by Thomas Anda, sitting in the control room behind the hall. He was controlling a station that was originally designed as a regular video streaming setup. This includes two remote-controlled PTZ cameras connected to a video mixer. For regular streams, we would have tapped the audio and video from the auditorium and made a mix to be streamed to, for example, YouTube. Now, we mainly used one of the PTZ cameras to display the general picture from the hall.

MCT student Thomas Anda sitting in the control room with the RITMO 2 station during RPPW.

The main job of Thomas was to playback all the 100 pre-recorded videos. We had a separate, ethernet-cabled PC for this task connected to the Zoom Webinar and shared the screen. The videos were cued in VLC, with poster images inserted between the videos. During our testing of this setup, we discovered the uneven sound level of the video files, which led to a normalization procedure for all of them.

In theory, we could have played the video files from RITMO station 1. However, both Wenbo and Thomas had plenty of things to think about, so it would have been hard to do it all alone. Also, having two stations allowed for having two camera views and also added redundancy for the stream.

Zoom captain station

The third station was controlled by our “Zoom captain”, Marit Furunes. She served as the main host of the Webinar most of the time and was responsible for promoting and de-promoting people to panellists during the conference.

Marit Furunes was the “Zoom captain”. Here, with conference chair Anne Danielsen in the background.

It is possible to set up panels in advance, but that requires separate Zoom Webinars and individualized invitation e-mails. We have experienced in the past that people often forget about these e-mails, so we decided to just have one Zoom Webinar for the entire conference and rather move people in and out of the panels manually. That required some manual work by Mari, but it also meant that she was in full control of who could talk in each session.

She was also in charge of turning on and off people’s video and sound, and ensuring that the final streamed looked fine.

Director station

I was sitting next to Marit, controlling the “director station”. I was mainly checking that things were running as they should, but I also served as a backup for Marit when she took breaks. In between, I also tweeted some highlights, replied to e-mails that came in, and commented on things in Slack.

From the “director station” I controlled one laptop as host and had another laptop for watching the output stream.

Monitoring and control

Together, the four of us involved in the production managed to create a nice-looking result. There were some glitches, but in general, things went as planned. The most challenging part of working with a Webinar-based setup is the lack of control and monitoring. What we have learned is that “what you see is not what you get”. We never really understood what to click on to get the final result we wanted. For example, we often had to click back and forth between the “gallery” and “speaker” view to get the desired result.

Also, as a host, you can turn off other cameras, but you cannot turn them on, only ask for the person to turn them on. That makes sense in many ways. After all, you should not be allowed to turn on the camera of another person remotely. However, as a production tool in an auditorium, this was cumbersome. It happened that Marit and I wanted to turn on the main video camera in the hall (from RITMO 2) or the front-facing camera (from RITMO 1). But we were not allowed to do this. Instead, we had to request that Thomas or Wenbo could turn on the cameras.

Summing up

The Zoom Webinar function was clearly made for a traditional video-conferencing-like setup. For that it also works very well. As described above, we managed to make it work quite well also in a hybrid setup. However, this required a 4-person strong team and 5 computers. The challenge was that we never really felt that we were completely in control of things. Also, we could not properly monitor the different signals.

The alternative would be a regular video streaming solution, based on video and audio mixers. That would have given us a much higher control of the final stream, including good monitoring. It would have required more equipment (that we have) but not necessarily more people. We would have lost out on some of the Zoom functionality, though, like the Q&A functionality that works very well.

Next time I am doing something like this, I would try to run a stream-based setup instead. Panellists could then come in through a Zoom Room, which could be mixed into a stream using either our hardware video mixer or a software mixer like OBS. Time will tell if that ends up being better or worse.

Running a hybrid conference

There are many ways to run conferences. Here is a summary of how we ran the Rhythm Production and Perception Workshop 2021 at RITMO this week. RPPW is called a workshop, but it is really a full-blown conference. Almost 200 participants enjoy 100 talks and posters, 2 keynote speeches, and 3 music performances spread across 4 days.

A group photo of RPPW2021 participants, taken in a Zoom Room before the last poster session..

A hybrid format

We started planning RPPW as an on-site event back in 2019. Then, when the pandemic hit, we quickly turned around and decided to make it into an online-only conference. But as the covid restrictions have been lifted in Norway recently, we decided to run it as a “hybrid” event. That is, everything was run both on-site at RITMO and online. Only RITMO people were physically present, though, so most people experienced it as an online-only event. Still, given that future events will probably be hybrid, we found it to be a good way of experimenting with this new conference format.

In my experience, running hybrid events are more challenging than online-only or on-site-only. The two formats are radically different. If this is not acknowledged and planned for, one risks having inferior experiences for everyone. However, if done right, hybrid may actually work quite well. Looking at feedback in the participant survey, I am happy to see that most people found it to work well. In fact, the majority actually favours a hybrid conference in the future. This would require some attention to details, several of which I will discuss below.

Time zone challenges

There are many technological challenges with running international conferences, but the challenge of different time zones is probably the biggest hurdle. I participated in the NIME 2021 conference last week, which was run from Shanghai. NIME is a truly international conference, with participants spread around the globe. To cater for global participation, the program was split into program blocks that were repeated each day. Therefore most paper/poster sessions ran twice a day. Keynotes happened live and were repeated as a “live replay” later in the day.

I think NIME’s block-based schedule worked quite well. The most important was that it allowed for combinations of participants from Asia+Europe and Europe/America throughout the days. The downside to this approach is that you don’t get the sense of “being together” at any point in time. For such global conferences, this is an unsolvable problem, I think. There will always be someone for whom the scheduling will be in the middle of the night.

For RPPW2021 we didn’t have many submissions from Asia/Australasia, unfortunately. That is a pity from a global perspective. However, on the positive side, this meant that the large majority of participants were based in Europe and the Americas. We, therefore, decided to run the conference in the evenings Oslo time, from 16:00-22:00. This is a little late for Europeans, and it is particularly tricky for those with children and other family obligations (myself included). But it allows for having a live, single-track conference format that creates the sense of “togetherness”.

Talking about time zone challenges, communicating when things happen can be tricky. This is particularly the case now that we have daylight savings here in Norway (CEST, UTC+2). One way to help people get the time right was to implement localized time on the conference page. That way people could easily figure out when sessions started in their local time zone. They had the same thing at NIME, but there you had to choose your time zone manually in the schedule. We implemented automatic schedule adjustments based on the computer clock. This worked for most people, except for those that (for some reason) had another time on their computer.

The participant web page contained all the relevant information in one location and with automatic localized time information.

Pre-recorded videos

Many conferences have landed on pre-recorded videos and live Q&A sessions as a default presentation format. We also decided to use this approach for RPPW. We asked people to prepare 10-minute long (short) videos. This is shorter than typical conference presentations, which often lasts 20-30 minutes. In my experience, pre-recorded videos are generally shorter, more precise, and easier to understand than live talks. So you actually save time this way, which can be used for other things. You also avoid all sorts of technical challenges with screen-sharing and so on.

Almost everyone managed to keep their pre-recorded videos within 10 minutes and I don’t think that we lost out on any information that would have been presented in a typical 20-minute lecture. We received some videos that were slightly longer, but these were effectively trimmed and/or sped up a little to be around 10 minutes long.

We had originally planned to use YouTube as our central video archive. However, when the student assistants started uploading videos to our YouTube account, we realized that there is an upload limit of 10 videos per day. We had around 100 videos, and with only a few days left to the conference, we had to find another solution.

Fortunately, UiO has a decent web player as part of the standard content management system. So we decided to rely on UiO’s video server and uploaded all the videos to a folder on the conference web page. We continued to upload 10 videos per day to YouTube, but we never really informed about that playlist. Still, it is useful as a backup solution. In hindsight, I would have planned to use the UiO solution from the start.

RPPW videos on the UiO web page.

Storing the videos online is only part of the job, though. Navigating and searching through them is critical for their usefulness. Therefore, we decided to set up a separate web page for participants on rppw2021.org. This page was designed by a group of students in our Music, Communication and Technology (MCT) master’s programme who had RPPW as their applied project this spring semester. They made a very nice navigation system from which we linked up all the videos hosted on the UiO server.

Embedding a UiO-stored video on the program page.

In the end, all the videos are navigable and searchable on the UiO web page, and they are also on YouTube. People are different, so it is good to have multiple paths to the same material.

Captions

I didn’t think much about captions before NIME 2020, in which accessibility was the main topic. However, over the last year, I have realized that captions are important not only for people with hearing impairments. Some struggle with understanding the language; others may want to keep the sound low or off for social or practical reasons. In all these cases, captions may help.

One reason we wanted to upload videos to YouTube was to leverage their great auto-caption feature. Unfortunately, when we switched to the UiO-based video solution, captioning had to be done differently. To the rescue came a new auto-text service that is currently in beta-testing at UiO. The service is actually based on a Google-driven caption engine, probably quite similar to the one used on YouTube. Unfortunately, we had to run this separately for each video, a time-consuming task for one of our student assistants. On the positive side, the auto-texting worked quite well. Much has happened since I tested a similar service a year ago.

The result of the auto-captioning is a .vtt file for each video file. These were uploaded next to the video files and made available to be turned on and off in the video player. An example can be seen here:

As can be seen, auto-texting is not perfect, but it helps a long way. I see that auto-texting is becoming more popular these days. There seems to be an increased awareness of the importance of captions. Hopefully, more and better tools will be made available on all platforms soon. In fact, Zoom has actually opened for live captioning in both Zoom Rooms and Webinars. However, for GDPR reasons, this is not yet available at UiO.

Normalization

As written about in another blog post, we found it necessary to normalize all the pre-recorded videos. That is, adjust the sound to a similar level. The videos were recorded with all sorts of equipment, so the sound quality and levels varied widely. We thought about this a little too late, unfortunately. By that time, we had already uploaded all the videos to the program page and YouTube. Still, we decided to normalize all the videos to ensure an equal sound level when multiple videos were played back consecutively during the conference. Next time I will think about normalizing all video files from the start.

Zoom Webinar

The streaming of RPPW2021 happened from Forsamlingssalen, our seminar room at RITMO. This hall can usually hold 70-80 people, now we were around 10, including those of us involved in the presentation. I have written a separate blog post about how the Webinar was produced.

Just finished with the dress rehearsal and getting ready for the start of RPPW2021. From left: Wenbo Yi, Marit Furunes, Anne Danielsen, Thomas Anda.

For me, one of the important things when it comes to physical-virtual communication is to get a sense of where people are located. Therefore, I argued strongly for using a camera that showed the hall and the people inside. We also decided that Anne Danielsen, who served as conference chair, should have short “intros” and “outros” for each session from the hall. This could be seen as unnecessary from a program perspective, and it definitely complicated the streaming. However, I think this trick was one of the reasons why people reported that they felt they were “at” RITMO.

RPPW2021 chair Anne Danielsen introduces a session from Forsamlingssalen.

We used the same Zoom Webinar for the whole conference, and it was running continuously each day. When there were other activities, break, posters, and performances, we put up a poster in the Webinar with information about where to go. We also showed the overview image from the hall the whole time. Several people commented that this was a nice gesture.

We displayed a “waiting screen” in the hall when there were activities elsewhere. This was visible to those that entered the Zoom Webinar by accident, guiding them to the right location.

Poster rooms

There are many ways of doing poster sessions. Some conferences have been adventurous and used virtual reality platforms like GatherTown and Mozilla Hubs. We went more “safe” and decided to use a Zoom Room for the poster sessions. At first, we had planned to have a separate Zoom Room per poster. Fortunately, during NIME 2021, I discovered that it was possible to pre-assign breakout rooms that people could move in and out of. So we decided to use the same solution for RPPW.

This worked well, I think, and people could freely move between rooms. We did three things that further improved the poster room experience:

  1. Each poster session started with a “poster blitz” showing 1-minute pre-recorded videos of all (~15) posters per session. This was a quick way for everyone to get an overview of all the content, and (hopefully) made it easier to choose where to go. Then the breakout rooms were opened for people to choose from.
  2. We had a poster session host sitting in the main Zoom Room all the time, helping people to navigate between rooms. Most people figured out how to switch themselves, but she could also manually move people around based on their requests. She also showed a slide with names and titles, which further simplified the selection of poster rooms.
  3. We had one RITMO student/staff assigned to each poster breakout room. This person entered the breakout room together with the presenter at the start of the poster session. This resource person could help with technical problems, but, more importantly, they would also help start the discussion. We had assigned people based on their interests, so all poster presenters had a knowledgeable person in the room from the start. If nobody comes to your poster at a physical conference, you can at least talk to the people standing next to you. In Zoom, you are very alone if nobody shows up. We did not want that to happen. In the end, the poster sessions turned into very lively events.
Our “Zoom captain”, Marit Furunes, gently guided people into the right breakout rooms from the main Poster Zoom Room.

Performances

We had scheduled three performances during the workshop, all of which were challenging in their own ways. To keep things manageable from a production perspective, these were run by separate teams in different venues and on separate online channels. Just to explain some of the complexity, here is a flowchart of the Fibres Out of Line performance, which featured a dancer improvising with 10 autonomous agents.

A flowchart of the streaming setup for the Fibres Out of Line performance/installation.

Equally challenging was the N-place performance, featuring three musicians in three cities (Oslo, Stockholm, Berlin). This performance relied on a low-latency audio connection and was, in addition, streaming the coordinates of the musicians based on motion tracking.

Testing the network setup between Oslo, Stockholm, and Berlin for the N-place performance.

The last performance with Hardanger fiddler Anne Hytta was the easiest from a technical point of view. However, streaming high-quality audio and video always requires a larger setup than you may originally envisage.

All the performances worked well. I think this was in large thanks to the separate production teams. Also, having separate spaces to set things up and test properly also helped a lot.

Slack

Like many other conferences, we also used Slack as a text-based communication channel during the conference. We explicitly asked people to use the Q&A function of the Zoom Webinar during the live sessions and use Slack for all communication. This generally worked well, I think, although Slack is not everyone’s taste

We spent some time discussing how to organize Slack. In the end, we had these channels:

  • Thematic channels: #entrainment, #medical, #music, #sms, #speech, #keynote
  • General channels: #general, #social-introduce-yourself, #tech-support, #feedback
  • Extra channels: #music-performance, #job-openings, #random

Of these, the #social-introduce-yourself channel was by far the most popular. Lots of people presented themselves there, which created a very nice and warm vibe.

There were also some discussions going on in the thematic channels, but perhaps less than what I have seen at other conferences. One reason for this may be that most people followed the content live, and there was, therefore, less need for Slack discussions. It will be interesting to see whether there will be any activity also after the conference.

Coffee room

Finally, to continue with creating the sense of “being there”, I was eager to set up a physical-virtual coffee room. This was physically set up in the RITMO kitchen and was on one hour before the program started and until one hour after the end of the program. Since only panellists were allowed to show their face and talk in the Zoom Webinar, this coffee room was a way for people to interact freely.

RITMO PhD fellow Dana Swarbrick interacts with some of the online RPPW participants in the physical-virtual Coffee Room. Croissants were only available for on-site participants…

The Coffee Room was not used a lot. Almost no one came in before and during the program, but a couple of the days there were some lively discussions going on after the program had ended. Then we also ended up creating some separate breakout rooms that people could move into for talking in smaller groups.

Even though the Coffee Room was not used a lot, I think it served its purpose. I also feel that it was conceptually very important both for online and on-site participants. The online participants got a chance to look into the “heart” of RITMO’s premises. And RITMO people passing by the screen got reminded that a conference with online participants was going on.

Summing up

All in all, I think RPPW2021 worked very well. The feedback we have received so far has also been overwhelmingly positive. It is interesting to see that what we ended up doing was not too far from what the MCT students suggested in their applied project.

The conceptual model design from the MCT applied project.

However, there are always things that could have been improved:

  • We knew that the scheduling would be hard for some. It didn’t feel good to leave out Asian/Australasian participants. Many Europeans also struggled with the evening schedule. Still, I think many people valued the possibility of “being together”. In retrospect, we should probably have included a 1-hour break in the program for eating lunch/dinner.
  • Zoom Webinar and Zoom Rooms are robust video conferencing solutions. However, they are tricky from a production perspective. Next time we are doing something like this, I wonder if we should replace the Webinar part with a regular streaming service. We streamed two of the concerts on YouTube and that allows for a very different, and more controlled, video production.
  • People have generally been good at muting themselves when not talking, so there were no feedback problems. However, the sound quality varied a lot. This was apparent when listening to the PA system in the hall. People using headsets with a microphone closer to the mouth were much easier to listen to than those using laptop microphones in reverberant rooms. So we should have focused more on asking people to use a headset and generally improve their setup.
  • We included auto-generated captions for the online videos. However, since we had no time to check the captions, and also didn’t want to clutter the screen too much, we decided to play the videos without captions during the live playback. It would have been nice to include professional, selectable captioning. However, we did not have the resources for that this time around. In the future, I hope that it would be possible to include auto-generated live captioning for all sessions.
Postdoc Mari Romarheim Haugen decided to move outside to participate in one of the RPPW sessions from RITMO’s terrace.

New publication: NIME and the Environment

This week I presented the paper NIME and the Environment: Toward a More Sustainable NIME Practice at the International Conference on New Interfaces for Musical Expression (NIME) in Shanghai/online with Raul Masu, Adam Pultz Melbye, and John Sullivan. Below is our 3-minute video summary of the paper.

And here is the abstract:

This paper addresses environmental issues around NIME research and practice. We discuss the formulation of an environmental statement for the conference as well as the initiation of a NIME Eco Wiki containing information on environmental concerns related to the creation of new musical instruments. We outline a number of these concerns and, by systematically reviewing the proceedings of all previous NIME conferences, identify a general lack of reflection on the environmental impact of the research undertaken. Finally, we propose a framework for addressing the making, testing, using, and disposal of NIMEs in the hope that sustainability may become a central concern to researchers.

Paper highlights

Our review of the NIME archive showed that only 12 out of 1867 NIME papers have explicitly mentioned environmental topics. This is remarkably low and calls for action.

My co-authors have launched the NIME eco wiki as a source of knowledge for the community. It is still quite empty, so we call for the community to help develop it further.

In our paper, we also present an environmental cost framework. The idea is that this matrix can be used as a tool to reflect on the resources used at various stages in the research process.

Our proposed NIME environmental cost framework.

The framework was first put into use during the workshop NIME Eco Wiki – a crash course on Monday. In the workshop, participants filled out a matrix each for one of their NIMEs. Even though the framework is a crude representation of a complex reality, many people commented that it was a useful starting point for reflection.

Hopefully, our paper can raise awareness about environmental topics and lead to a lasting change in the NIME community.

Making 100 video poster images programmatically

We are organizing the Rhythm Production and Perception Workshop 2021 at RITMO a week from now. Like many other conferences these days, this one will also be run online. Presentations have been pre-recorded (10 minutes each) and we also have short poster blitz videos (1 minute each).

Pre-recorded videos

People have sent us their videos in advance, but they all have different first “slides”. So, to create some consistency among the videos, we decided to make an introduction slide for each of them. This would then also serve as the “thumbnail” of the video when presented in a grid format.

One solution could be to add some frames at the beginning of each video file. This could probably be done with FFmpeg without recompressing the files. However, given that we are talking about approximately 100 video files, I am sure there would have been some hiccups.

A quicker and better option is to add “poster images” when uploading the files to YouTube. We also support this on UiO’s web pages, which serves as the long-term archive of the material. The question, then, is how to create these 100 poster images without too much work. Here is how I did it on my Ubuntu machine.

Mail Merge in LibreOffice Writer

My initial thought was to start with Impress, the free presentation software in LibreOffice. I quickly searched to see if there is any tool to create slides programmatically but didn’t find anything that seemed to be straightforward.

Instead, I remembered the good old “mail merge” functionality of Writer. This was made for creating envelope labels back in the days when people still sent physical mail. However, it can be tweaked for other things. After all, I have the material I wanted to include in the poster image in a simple spreadsheet, so it was quick and easy to import the spreadsheet in Writer and select the two columns I wanted to include (“author name” and “title”).

A spreadsheet with the source information about authors and paper titles.

I wanted the final image to be in Full-HD format (1920 x 1080 pixels), which is not a standard format in Writer. However, there is the option of choosing a custom page size, so I set up a page size of 192 x 108 mm in Writer. Then I added some fixed elements on the page, including a RITMO emblem and the conference title.

Setting up the template in LibreOffice Writer.

Finally, I saved a file with the merged content and exported as a PDF.

From PDF to PNG

The output of Writer was a multi-page PDF. However, what we need is a single image file per video. So I turned to the terminal and used this oneliner based on pdfseparate to split up the PDF into multiple one-page PDF files:

pdfseparate rppw2021-papers-merged.pdf posters%d.pdf

The trick here is to use the %d command to get a sequential number for each PDF.

Next, I wanted to convert these individual PDF files to PNG files. Here I turned to the convert function of ImageMagick, and wrote a short one-liner that does the trick:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; convert -density 300 -resize 1920x1080 -background white -flatten "$i" "$name.png"; done

It looks for all the PDFs in a directory and converts them to a PNG file with a Full-HD resolution. I found that it was necessary to include the “-density 300” to get a nice-looking image. For some reason, the default seems to be a fairly low-quality resolution. To avoid any transparency issues in later stages, I also included the “-background white” and “-flatten” functions.

The end result was a folder of PNG files.

Putting it all together

The last step is to match the video files with the right PNG image in the video playback solution. Here it is shown using the video player we have at UiO:

Once I figured out the workflow, the whole process was very rapid. Hopefully, this post can save someone many hours of manual work!