Or, more specifically: can AI replace an artist? That is the question posed in a short documentary that I have contributed to for this year’s Research Days.
We were contacted before summer about trying to create a new song based on the catalogue of the Norwegian artist Ary. The idea was to use machine learning to generate the song. This has turned out to be an exciting project.
The project started with Ary sending us a bunch of her lyrics in text format and dumps from a digital audio workstation. This material was not really machine-readable/listenable, so Lars had to spend a great deal of time manually structuring and annotating it into a symbolic data set that could be used for training. The machine-learning part of the project involved generating lyrics following this approach, using an AI model that can already speak English. The melody was generated using Bachprop, based on a deep recurrent neural network. Then Lars put it all together into a final soundtrack that we played for Ary.
When talking about AI, I always find it important to highlight that humans are important for the final result. Yes, the machine makes something, but not without a lot of human guidance. The song played in the video (and the many other songs we also generated) was “composed” by a computer. However, Lars made many important decisions throughout the project: the initial preparation of the training material, the models used, the methods used, all the settings used, the selection of which lyrics and melodies to choose, and the final putting together of everything.
Last year, I was involved in several discussions about the potential challenges of using AI in music-making. I also wrote a blog post about some of the possibilities. While in theory AI can be used autonomously I believe that the most interesting is the meeting point between humans and machines. Providing artists like Ary with AI-based technologies can lead to exciting new music!
This year’s Sound and Music Computing (SMC) Conference has opened for virtual lab tours. When we cannot travel to visit each other, this is a great way to showcase how things look and what we are working on.
Needless to say, we only scratched the surface of everything going on in the field of sound and music computing at the University of Oslo in this video. The video focused primarily on our infrastructures. We have several ongoing projects that use these studios and labs and also some non-lab-based projects. This include:
There are many ways to run conferences. Here is a summary of how we ran the Rhythm Production and Perception Workshop 2021 at RITMO this week. RPPW is called a workshop, but it is really a full-blown conference. Almost 200 participants enjoy 100 talks and posters, 2 keynote speeches, and 3 music performances spread across 4 days.
A hybrid format
We started planning RPPW as an on-site event back in 2019. Then, when the pandemic hit, we quickly turned around and decided to make it into an online-only conference. But as the covid restrictions have been lifted in Norway recently, we decided to run it as a “hybrid” event. That is, everything was run both on-site at RITMO and online. Only RITMO people were physically present, though, so most people experienced it as an online-only event. Still, given that future events will probably be hybrid, we found it to be a good way of experimenting with this new conference format.
In my experience, running hybrid events are more challenging than online-only or on-site-only. The two formats are radically different. If this is not acknowledged and planned for, one risks having inferior experiences for everyone. However, if done right, hybrid may actually work quite well. Looking at feedback in the participant survey, I am happy to see that most people found it to work well. In fact, the majority actually favours a hybrid conference in the future. This would require some attention to details, several of which I will discuss below.
Time zone challenges
There are many technological challenges with running international conferences, but the challenge of different time zones is probably the biggest hurdle. I participated in the NIME 2021 conference last week, which was run from Shanghai. NIME is a truly international conference, with participants spread around the globe. To cater for global participation, the program was split into program blocks that were repeated each day. Therefore most paper/poster sessions ran twice a day. Keynotes happened live and were repeated as a “live replay” later in the day.
I think NIME’s block-based schedule worked quite well. The most important was that it allowed for combinations of participants from Asia+Europe and Europe/America throughout the days. The downside to this approach is that you don’t get the sense of “being together” at any point in time. For such global conferences, this is an unsolvable problem, I think. There will always be someone for whom the scheduling will be in the middle of the night.
For RPPW2021 we didn’t have many submissions from Asia/Australasia, unfortunately. That is a pity from a global perspective. However, on the positive side, this meant that the large majority of participants were based in Europe and the Americas. We, therefore, decided to run the conference in the evenings Oslo time, from 16:00-22:00. This is a little late for Europeans, and it is particularly tricky for those with children and other family obligations (myself included). But it allows for having a live, single-track conference format that creates the sense of “togetherness”.
Talking about time zone challenges, communicating when things happen can be tricky. This is particularly the case now that we have daylight savings here in Norway (CEST, UTC+2). One way to help people get the time right was to implement localized time on the conference page. That way people could easily figure out when sessions started in their local time zone. They had the same thing at NIME, but there you had to choose your time zone manually in the schedule. We implemented automatic schedule adjustments based on the computer clock. This worked for most people, except for those that (for some reason) had another time on their computer.
Many conferences have landed on pre-recorded videos and live Q&A sessions as a default presentation format. We also decided to use this approach for RPPW. We asked people to prepare 10-minute long (short) videos. This is shorter than typical conference presentations, which often lasts 20-30 minutes. In my experience, pre-recorded videos are generally shorter, more precise, and easier to understand than live talks. So you actually save time this way, which can be used for other things. You also avoid all sorts of technical challenges with screen-sharing and so on.
Almost everyone managed to keep their pre-recorded videos within 10 minutes and I don’t think that we lost out on any information that would have been presented in a typical 20-minute lecture. We received some videos that were slightly longer, but these were effectively trimmed and/or sped up a little to be around 10 minutes long.
We had originally planned to use YouTube as our central video archive. However, when the student assistants started uploading videos to our YouTube account, we realized that there is an upload limit of 10 videos per day. We had around 100 videos, and with only a few days left to the conference, we had to find another solution.
Fortunately, UiO has a decent web player as part of the standard content management system. So we decided to rely on UiO’s video server and uploaded all the videos to a folder on the conference web page. We continued to upload 10 videos per day to YouTube, but we never really informed about that playlist. Still, it is useful as a backup solution. In hindsight, I would have planned to use the UiO solution from the start.
Storing the videos online is only part of the job, though. Navigating and searching through them is critical for their usefulness. Therefore, we decided to set up a separate web page for participants on rppw2021.org. This page was designed by a group of students in our Music, Communication and Technology (MCT) master’s programme who had RPPW as their applied project this spring semester. They made a very nice navigation system from which we linked up all the videos hosted on the UiO server.
In the end, all the videos are navigable and searchable on the UiO web page, and they are also on YouTube. People are different, so it is good to have multiple paths to the same material.
I didn’t think much about captions before NIME 2020, in which accessibility was the main topic. However, over the last year, I have realized that captions are important not only for people with hearing impairments. Some struggle with understanding the language; others may want to keep the sound low or off for social or practical reasons. In all these cases, captions may help.
One reason we wanted to upload videos to YouTube was to leverage their great auto-caption feature. Unfortunately, when we switched to the UiO-based video solution, captioning had to be done differently. To the rescue came a new auto-text service that is currently in beta-testing at UiO. The service is actually based on a Google-driven caption engine, probably quite similar to the one used on YouTube. Unfortunately, we had to run this separately for each video, a time-consuming task for one of our student assistants. On the positive side, the auto-texting worked quite well. Much has happened since I tested a similar service a year ago.
The result of the auto-captioning is a .vtt file for each video file. These were uploaded next to the video files and made available to be turned on and off in the video player. An example can be seen here:
As can be seen, auto-texting is not perfect, but it helps a long way. I see that auto-texting is becoming more popular these days. There seems to be an increased awareness of the importance of captions. Hopefully, more and better tools will be made available on all platforms soon. In fact, Zoom has actually opened for live captioning in both Zoom Rooms and Webinars. However, for GDPR reasons, this is not yet available at UiO.
As written about in another blog post, we found it necessary to normalize all the pre-recorded videos. That is, adjust the sound to a similar level. The videos were recorded with all sorts of equipment, so the sound quality and levels varied widely. We thought about this a little too late, unfortunately. By that time, we had already uploaded all the videos to the program page and YouTube. Still, we decided to normalize all the videos to ensure an equal sound level when multiple videos were played back consecutively during the conference. Next time I will think about normalizing all video files from the start.
The streaming of RPPW2021 happened from Forsamlingssalen, our seminar room at RITMO. This hall can usually hold 70-80 people, now we were around 10, including those of us involved in the presentation. I have written a separate blog post about how the Webinar was produced.
For me, one of the important things when it comes to physical-virtual communication is to get a sense of where people are located. Therefore, I argued strongly for using a camera that showed the hall and the people inside. We also decided that Anne Danielsen, who served as conference chair, should have short “intros” and “outros” for each session from the hall. This could be seen as unnecessary from a program perspective, and it definitely complicated the streaming. However, I think this trick was one of the reasons why people reported that they felt they were “at” RITMO.
We used the same Zoom Webinar for the whole conference, and it was running continuously each day. When there were other activities, break, posters, and performances, we put up a poster in the Webinar with information about where to go. We also showed the overview image from the hall the whole time. Several people commented that this was a nice gesture.
There are many ways of doing poster sessions. Some conferences have been adventurous and used virtual reality platforms like GatherTown and Mozilla Hubs. We went more “safe” and decided to use a Zoom Room for the poster sessions. At first, we had planned to have a separate Zoom Room per poster. Fortunately, during NIME 2021, I discovered that it was possible to pre-assign breakout rooms that people could move in and out of. So we decided to use the same solution for RPPW.
This worked well, I think, and people could freely move between rooms. We did three things that further improved the poster room experience:
Each poster session started with a “poster blitz” showing 1-minute pre-recorded videos of all (~15) posters per session. This was a quick way for everyone to get an overview of all the content, and (hopefully) made it easier to choose where to go. Then the breakout rooms were opened for people to choose from.
We had a poster session host sitting in the main Zoom Room all the time, helping people to navigate between rooms. Most people figured out how to switch themselves, but she could also manually move people around based on their requests. She also showed a slide with names and titles, which further simplified the selection of poster rooms.
We had one RITMO student/staff assigned to each poster breakout room. This person entered the breakout room together with the presenter at the start of the poster session. This resource person could help with technical problems, but, more importantly, they would also help start the discussion. We had assigned people based on their interests, so all poster presenters had a knowledgeable person in the room from the start. If nobody comes to your poster at a physical conference, you can at least talk to the people standing next to you. In Zoom, you are very alone if nobody shows up. We did not want that to happen. In the end, the poster sessions turned into very lively events.
We had scheduled three performances during the workshop, all of which were challenging in their own ways. To keep things manageable from a production perspective, these were run by separate teams in different venues and on separate online channels. Just to explain some of the complexity, here is a flowchart of the Fibres Out of Line performance, which featured a dancer improvising with 10 autonomous agents.
Equally challenging was the N-place performance, featuring three musicians in three cities (Oslo, Stockholm, Berlin). This performance relied on a low-latency audio connection and was, in addition, streaming the coordinates of the musicians based on motion tracking.
All the performances worked well. I think this was in large thanks to the separate production teams. Also, having separate spaces to set things up and test properly also helped a lot.
Like many other conferences, we also used Slack as a text-based communication channel during the conference. We explicitly asked people to use the Q&A function of the Zoom Webinar during the live sessions and use Slack for all communication. This generally worked well, I think, although Slack is not everyone’s taste
We spent some time discussing how to organize Slack. In the end, we had these channels:
General channels: #general, #social-introduce-yourself, #tech-support, #feedback
Extra channels: #music-performance, #job-openings, #random
Of these, the #social-introduce-yourself channel was by far the most popular. Lots of people presented themselves there, which created a very nice and warm vibe.
There were also some discussions going on in the thematic channels, but perhaps less than what I have seen at other conferences. One reason for this may be that most people followed the content live, and there was, therefore, less need for Slack discussions. It will be interesting to see whether there will be any activity also after the conference.
Finally, to continue with creating the sense of “being there”, I was eager to set up a physical-virtual coffee room. This was physically set up in the RITMO kitchen and was on one hour before the program started and until one hour after the end of the program. Since only panellists were allowed to show their face and talk in the Zoom Webinar, this coffee room was a way for people to interact freely.
The Coffee Room was not used a lot. Almost no one came in before and during the program, but a couple of the days there were some lively discussions going on after the program had ended. Then we also ended up creating some separate breakout rooms that people could move into for talking in smaller groups.
Even though the Coffee Room was not used a lot, I think it served its purpose. I also feel that it was conceptually very important both for online and on-site participants. The online participants got a chance to look into the “heart” of RITMO’s premises. And RITMO people passing by the screen got reminded that a conference with online participants was going on.
All in all, I think RPPW2021 worked very well. The feedback we have received so far has also been overwhelmingly positive. It is interesting to see that what we ended up doing was not too far from what the MCT students suggested in their applied project.
However, there are always things that could have been improved:
We knew that the scheduling would be hard for some. It didn’t feel good to leave out Asian/Australasian participants. Many Europeans also struggled with the evening schedule. Still, I think many people valued the possibility of “being together”. In retrospect, we should probably have included a 1-hour break in the program for eating lunch/dinner.
Zoom Webinar and Zoom Rooms are robust video conferencing solutions. However, they are tricky from a production perspective. Next time we are doing something like this, I wonder if we should replace the Webinar part with a regular streaming service. We streamed two of the concerts on YouTube and that allows for a very different, and more controlled, video production.
People have generally been good at muting themselves when not talking, so there were no feedback problems. However, the sound quality varied a lot. This was apparent when listening to the PA system in the hall. People using headsets with a microphone closer to the mouth were much easier to listen to than those using laptop microphones in reverberant rooms. So we should have focused more on asking people to use a headset and generally improve their setup.
We included auto-generated captions for the online videos. However, since we had no time to check the captions, and also didn’t want to clutter the screen too much, we decided to play the videos without captions during the live playback. It would have been nice to include professional, selectable captioning. However, we did not have the resources for that this time around. In the future, I hope that it would be possible to include auto-generated live captioning for all sessions.
This paper addresses environmental issues around NIME research and practice. We discuss the formulation of an environmental statement for the conference as well as the initiation of a NIME Eco Wiki containing information on environmental concerns related to the creation of new musical instruments. We outline a number of these concerns and, by systematically reviewing the proceedings of all previous NIME conferences, identify a general lack of reflection on the environmental impact of the research undertaken. Finally, we propose a framework for addressing the making, testing, using, and disposal of NIMEs in the hope that sustainability may become a central concern to researchers.
Our review of the NIME archive showed that only 12 out of 1867 NIME papers have explicitly mentioned environmental topics. This is remarkably low and calls for action.
My co-authors have launched the NIME eco wiki as a source of knowledge for the community. It is still quite empty, so we call for the community to help develop it further.
In our paper, we also present an environmental cost framework. The idea is that this matrix can be used as a tool to reflect on the resources used at various stages in the research process.
The framework was first put into use during the workshop NIME Eco Wiki – a crash course on Monday. In the workshop, participants filled out a matrix each for one of their NIMEs. Even though the framework is a crude representation of a complex reality, many people commented that it was a useful starting point for reflection.
Hopefully, our paper can raise awareness about environmental topics and lead to a lasting change in the NIME community.
We presented the installation Strings On-Line at NIME 2020. It was supposed to be a physical installation at the conference to be held in Birmingham, UK.
Due to the corona crisis, the conference went online, and we decided to redesign the proposed physical installation into an online installation instead. The installation ran continuously from 21-25 July last year, and hundreds of people “came by” to interact with it.
I finally got around to edit a short (1-minute) video promo of the installation:
I have also made a short (10-minute) “behind the scenes” mini-documentary about the installation. Here researchers from RITMO, University of Oslo, talk about the setup featuring 6 self-playing guitars, 3 remote-controlled robots, and a 24/7 high-quality, low-latency, audiovisual stream.
We are planning a new installation for the RPPW conference this year. So if you are interested in exploring such an online installation live, please stay tuned.