Or, more specifically: can AI replace an artist? That is the question posed in a short documentary that I have contributed to for this year’s Research Days.
We were contacted before summer about trying to create a new song based on the catalogue of the Norwegian artist Ary. The idea was to use machine learning to generate the song. This has turned out to be an exciting project.
The project started with Ary sending us a bunch of her lyrics in text format and dumps from a digital audio workstation. This material was not really machine-readable/listenable, so Lars had to spend a great deal of time manually structuring and annotating it into a symbolic data set that could be used for training. The machine-learning part of the project involved generating lyrics following this approach, using an AI model that can already speak English. The melody was generated using Bachprop, based on a deep recurrent neural network. Then Lars put it all together into a final soundtrack that we played for Ary.
When talking about AI, I always find it important to highlight that humans are important for the final result. Yes, the machine makes something, but not without a lot of human guidance. The song played in the video (and the many other songs we also generated) was “composed” by a computer. However, Lars made many important decisions throughout the project: the initial preparation of the training material, the models used, the methods used, all the settings used, the selection of which lyrics and melodies to choose, and the final putting together of everything.
Last year, I was involved in several discussions about the potential challenges of using AI in music-making. I also wrote a blog post about some of the possibilities. While in theory AI can be used autonomously I believe that the most interesting is the meeting point between humans and machines. Providing artists like Ary with AI-based technologies can lead to exciting new music!
I am happy to announce that fourMs researcher Kristian Nymoen has successfully defended his PhD dissertation, and that the dissertation is now available in the DUO archive. I have had the pleasure of co-supervising Kristian’s project, and also to work closely with him on several of the papers included in the dissertation (and a few others).
There are strong indications that musical sound and body motion are related. For instance, musical sound is often the result of body motion in the form of sound-producing actions, and muscial sound may lead to body motion such as dance. The research presented in this dissertation is focused on technologies and methods of studying lower-level features of motion, and how people relate motion to sound. Two experiments on so-called sound-tracing, meaning representation of perceptual sound features through body motion, have been carried out and analysed quantitatively. The motion of a number of participants has been recorded using stateof- the-art motion capture technologies. In order to determine the quality of the data that has been recorded, these technologies themselves are also a subject of research in this thesis. A toolbox for storing and streaming music-related data is presented. This toolbox allows synchronised recording of motion capture data from several systems, independently of systemspecific characteristics like data types or sampling rates. The thesis presents evaluations of four motion tracking systems used in research on musicrelated body motion. They include the Xsens motion capture suit, optical infrared marker-based systems from NaturalPoint and Qualisys, as well as the inertial sensors of an iPod Touch. These systems cover a range of motion tracking technologies, from state-of-the-art to low-cost and ubiquitous mobile devices. Weaknesses and strengths of the various systems are pointed out, with a focus on applications for music performance and analysis of music-related motion. The process of extracting features from motion data is discussed in the thesis, along with motion features used in analysis of sound-tracing experiments, including time-varying features and global features. Features for realtime use are also discussed related to the development of a new motion-based musical instrument: The SoundSaber. Finally, four papers on sound-tracing experiments present results and methods of analysing people’s bodily responses to short sound objects. These papers cover two experiments, presenting various analytical approaches. In the first experiment participants moved a rod in the air to mimic the sound qualities in the motion of the rod. In the second experiment the participants held two handles and a different selection of sound stimuli was used. In both experiments optical infrared marker-based motion capture technology was used to record the motion. The links between sound and motion were analysed using four approaches. (1) A pattern recognition classifier was trained to classify sound-tracings, and the performance of the classifier was analysed to search for similarity in motion patterns exhibited by participants. (2) Spearman’s p correlation was applied to analyse the correlation between individual sound and motion features. (3) Canonical correlation analysis was applied in order to analyse correlations between combinations of sound features and motion features in the sound-tracing experiments. (4) Traditional statistical tests were applied to compare sound-tracing strategies between a variety of sounds and participants differing in levels of musical training. Since the individual analysis methods provide different perspectives on the links between sound and motion, the use of several methods of analysis is recommended to obtain a broad understanding of how sound may evoke bodily responses.
Paper IV SoundSaber — A Motion Capture Instrument. K. Nymoen, S.A. Skogstad and A.R. Jensenius. In Proceedings of the International Conference on New Interfaces for Musical Expression, pages 312–315, University of Oslo 2011.
Paper VII A Statistical Approach to Analyzing Sound Tracings. K. Nymoen, J. Torresen, R.I. Godøy, and A.R. Jensenius. In S. Ystad, M. Aramaki, R. Kronland-Martinet, K. Jensen, and S. Mohanty (eds.) Speech, Sound and Music Processing: Embracing Research in India, volume 7172 of Lecture Notes in Computer Science, pages 120–145. Springer, Berlin Heidelberg 2012. The original publication is available at www.springerlink.com.
Paper VIII Analysing Correspondence Between Sound Objects and Body Motion. K. Nymoen, R.I. Godøy, A.R. Jensenius, and J. Torresen. To appear in ACM Transactions on Applied Perception.