Computer Music Modeling and Retrieval

CMMR, Pisa, Italy 26-28 September 2005

This was a rather small conference, with only about 40 participants, organised at the CNR in lovely Pisa. The topics presented were varied, but here, as in most other computer music conferences these days, there were a high percentage of music information retrieval presentations. I was there to present a short paper on building low-cost music controllers from hacked gamepads and homemade sensors, something I worked on while at McGill in the spring. A summary of things I found interesting:

  • Mark Havryliv and Terumi Narushima, Wollongong University, Australia, presented Metris, a Tetris-like game for music. Unfortunately, I arrived right after their presentation, but the concept seems very interesting.
  • Laurent Pottier, GMEM, presented a microsound system implemented in Max/MSP, and different ways of controlling it. I am looking forward to the release of the objects.
  • Leonello Tarabella, Pisa, played with his "air piano" system using video analysis. It worked very well considering the obvious problems with resolution and speed of video cameras.
  • Carlos Guedes, NYU and Porto, presented a dance-piece using the m-objects that he presented ICMC a couple of weeks ago. He has been focusing on rhythmic aspects of dance movements and implementations in music. Very nice!
  • Philippe Guillemain, CNRS Marseille, presented work on transitions in reed instruments. This is perceptually very relevant, and it is strange that it has not received greater attention earlier.
  • Giordano Cabral, Paris 6, presented something Francois Pachet covered very quickly at the S2S^2 summer school, and this time I actually understood some more. It is about using the Extractor Discovery System (EDS) for recognition. It is using a genetic algorithm to automatically build extraction algorithms from a set of basic mathematics and signal processing operators. For the user, this makes it possible to ask the system to develop different types of equations, and make it find the best ones. Seems very interesting, and apparently it works.
  • Markus Schedl, Johannes Kepler University, Linz , presented a web-mining paper, where they had been building artist ranking tables withing various musical styles based on querying for artist-pairs. The novel thing was how they had made a penalizing system, to avoid over-ranking of artists with names similar to common words (kiss, prince, madonna). I find it fascinating that such systems, which are completely ignorant of any music theory, manage to come up with results which seems to be very "correct" in terms of human classification.
  • Rodrigo Segnini and Craig Sapp, CCRMA and CCARH, Stanford, presented the ideas of making Scoregrams from notation. This is basically a way of generating "spectrograms" of a symbolic signal. The point is to be able to quickly visualize what is going on at different levels in the music. They have made them so that the window size they are looking at is decreasing from bottom to top. This gives a very detailed image at the bottom and only one value on top.
  • Snorre Farner, NTNU, presented work on "naturalness" in clarinet play, and jump-started a discussion on the concept of naturalness, expressivness etc. Definitely a burning topic these days!
  • Christophe Rhodes, Goldsmiths, London, had made a system for writing lute tablature. Kind of a niche thing, but it looked very neat!
  • Mark Marshall, McGill, presented results from some preliminary tests on the usability of various sensors for musical applications. I think this is a very important topic, and I hope he continues to look into this. At the moment he has been focusing on pitch/melody related issues, but this should be extended to also cover rhythmical and timbral elements.
  • Kristoffer Jensen, Aalborg, showed some very interesting examples of the boundaries between noice and tonal sound.
  • Cynthia Grund, Odense, called for a panel on interdisciplinarity issues. Coming from music philosophy, she called for more coopeartion between the technologies and the "traditional" humanities. Many coming from a technical side would benefit from looking at recent issues in the humanities, and vice versa. What is quite clear is that most people working in the humanities have not realized the exponential growth in the Music Information Retrieval in the last years, driven by strong commercial and application-based (internet queries) interests.

International Computer Music Conference

ICMC, Barcelona, Spain 4-10 September 2005

Sunday 4 September

  • I attended a workshop on audio mosaicing, which was more like a set of presentations by different people, but still interesting.
  • Jason Freeman, PhD from Columbia, now at Georgia Tech, talked about a java applet creating a 5 second "thumbnail song" of your iTunes collection.
  • Opening concert
    • Chris Brown had composed a piece for the Reactable interactive table made at UPF. The table is very nice, and is responding quickly, but I felt there was a missing link when it comes to the relationships between the gestures made, objects presented and sonic output.
    • Jose Manuel Berenguer played sounds and visuals. I liked the beginning a lot, with a nice combination of granulated sounds and visual particle sworms.
  • Ali Momeni’s installation "un titled" is using the new moother object which makes it possible to acces the Freesound database from within Max/MSP and PD. Ali used it to query for similar files and organizing them in 2-dimensional "sound spaces". A large mechanical construction is controlling the parameters via Wacom tablets. Nice concept and I like the idea of getting things bigger and more heavy to use, but I had some problems with the mappings and the concept of having to press the large sticks down in the ground to get new sounds.

Monday 5 September

  • Fernando Lopez-Lezcano, CCRMA, Stanford, talked about Planet CCRMA and future issues. On a question on free software, he said something like "to me, free software is definitely not free".
  • Norbert Schnell from IRCAM presented FTM, a nice collection of Max-objects for more advanced data handling in Max/MSP.
  • Rosemary Mountain, Concordia / Hexagram, showed a setup for testing how people can organize visual and auditory stimuli. She used a wireless barcode reader.
  • Ge Wang, Soundlab, Princeton, showed his Chuck programming language, a text-based music language, with some nice graphical add-ons. I’m very sorry I missed his "text-battle" with Nick Collins at the Off-ICMC.

Tuesday 6 September

  • Vegard Sandvold, NOTAM, presented some promising results on the use of semantic descriptors of musical intensity. I tried the experiment when it was up and running, and have some problems with the concept of forcing stimuli into predetermined categories. It would be interesting to do a set of similar experiments using a continuous scale instead. His system is currently used by NRK in the radio-intentiometer.
  • Douglas Geers, Minnesota, presented a nice piece in the evening concert, with a violinist wearing glowing thread which he processed with Jitter.

Wednesday 7 September

Thursday 8 September

  • Rui Pedro Paiva, University of Coimbra, Portugal, presented a way of melody extraction from a polyphonic signal. Based on auditory filtering, and with no attempt to make it fast, they obtained an average performance of about 82% on a varied set of music.
  • Geoffery Peeters, IRCAM, presented a method for rhythm detection which seems to be very promising.
  • Nick Collins, Cambridge, presented an overview of different segmentation algorithms.
  • Xavier Serra, UPF, presented a nice overview of current music technology research, and called for a roadmap for future research.

Friday 9 September

  • Eduardo Reck Miranda, Future Music Lab, Plymouth, showed some of his work using EEG to control music. They still have a long way to go, since the signals are weak and noisy, but they had managed to get people to control simple playback of sequences.
  • Carlos Guedes, NYU / Porto, presented his m-tools, a small package of Max-objects developed for controlling musical rhythm from dance movements.
  • and Perry Cook, Princeton, showed tools
  • Jasch and played a nice set at the Off-ICMC.

S2S2 Summer School

S2S^2 Summer School , Genova 25-29 July 2005

This PhD summer school, an initiative of the S2S^2 inititative (Sound to Sense, Sense to Sound) was not much of a school, but rather a normal conference with lots of presentations of current research in the related fields. Although I knew quite a lot of the work beforehand, it was anyway a good brush up on a number of different subjects.

Things I found interesting

  • Pedro Rebelo from SARC, Belfast, talked about prosthetic instruments, and how instruments should be specific and non-generic to work well.
  • Roberto Bresin and Anders Friberg from KTH showed their conductor system using emotional descriptors such as sad-tender and angry-happy (from Gabrielson and Juslin studies on valence and energy in music).
  • Marc Leman from IPEM, Ghent, gave a nice historical overview of systematic musicology.
    • He also explained how he has shifted his attention from purely cognitive studies to embodied cognition.
    • Concerning subjectivity versus objectivity in the study of music, he explained how we do tend to agree on a lot of things, for example “subjective” things like similarity and musical genre, thus it should be possible to study them in an objective way.
    • A new audio library for EyesWeb is coming soon
  • Gerhard Widmer from OFAI, Vienna commented that a problem in AI and music is that too many have been working on making computers do tasks that are neither particularly useful nor interesting, such as Bach harmonization and Schenker analysis, all based on symbolic notation.
  • I always like to hear what Francois Pachet from the Sony-lab in Paris is up to. The lecture was, unfortunately, too general rather than giving time to go into details:
    • He gave an overview of his previous work with the improvisation tool Continuator
    • In his more recent work in music information retrieval, he suggested to use Google for finding “sociological” measures of artist similarity. This is done by querying google for combinations of artists, for example “Mozart + Beatles” will give some meta-measure of the similarity between the two. Doing this for a number of artists and making a confusion matrix, can be used to tell the similarity or difference. A rather crude and non-musical thing to do, but it seemed to work well in the examples he showed. Having some sort of non-musical approach to the topic can, indeed, reveal interesting things that working with only the audio streams might not. Such a “sociological” approach reminded me of an MIT-paper from ICMC 2002 where they looked at Napster traffic to find what people were querying for and downloading.
    • He also presented musaicing, or how to create mosaics of music, and played some interesting examples.
    • Another interesting approach was to use evolutionary algorithms to generate new timbre analysis functions in a lisp-like language, and then evaluate the output to see which worked better.
    • On a comment that similarity is an ill-defined word he commented “well-defined problems are for engineers”.
  • Tony Myatt from MRC at York was focusing on what he considered problems in the music technology field:
    • He said that a problem in the field is that a lot of the technologies developed never get to mature since they are only used once in a concert and then forgotten.
    • He also commented on the post-digital composers (referring to Kim Cascone’s CMJ 2000 paper on the “Aesthetics of failure”) working outside of the academic institutions.
    • A problem is that the community has not been good enough at communicating the values of music technology. We lack a common music-tech vision. He showed an example from an astronomy journal where ESA clearly defines key research areas for the coming years.
    • His visions for the future includes:
      • Better integration of musicians/composers in the field
      • Coordination of other disciplines: brain science, psychology, engineering
      • New aesthetic approaches
      • Better communication
  • The Pompeu Fabra group gave a general overview of content-based audio processing, focusing on the four axis of timbre, melody/harmony, rhythm and structure.
    • Showed a number of impressive demos of realtime processing of sound:
      • Time scaling
      • Rhythm transformation, swing to non-swing etc
      • Change voice from male to female
      • Harmonizer
      • Single voice makes a full choir (with time deviations!)
      • Modify a single object (for example voice) in a polyphonic stream
  • Erkut from Helsinki Technical University outlined the different processes which might fall in the group called “physics-based sounds synthesis”.
    • The essential part of physical modelling is the bi-directionality of the blocks
  • Alain de Cheveign√© from IRCAM warned about not forgetting everything happening between the cochlea and the cortex
  • Stephen McAdams from McGill gave a very brief overview of his research:
    • Timbre can be described by attack time, spectral centroid and spectral deviation
    • He proposed psychomechanics as a term to describe quantative links between mechanical properties of objects and their auditory results – mechanics, perception, acoustics
    • An important part for the future is to figure out a number of minimum parameters necessary for studying sound
  • Mark Sandler from Queen Mary, London, was one of few speakers that had a somewhat more critical presentation, attacking some general trends in the MIR community:
    • Difference between “everyday” and “musical” listening
    • The community has been focusing on computer listening. There should be more subjective testing on representative populations
    • Should not only use compressed sound (MP3) as raw material for analysis:
    • Removal of sound in the higher and lower parts of the spectrum
    • Reduction of the stereo field by combining L&R channels
    • This gives polluted source files -> inconclusive science
    • MP3 encoders work differently, so if you don’t know which encoder was used, you don’t know what you have
    • Similarity algorithms work better on uncompressed soun
  • Francois Pachet commented that we should continue to study compressed sound, since people actually listen to that. If the algorithms don’t work well on compressed sound, it is a problem with the algorithm not people.
  • Someone (forgot to write down who…) presented a new Audition library toolbox for PD which seemed interesting
  • Maria Chait from Maryland, US and ENS, Paris said it is important to remember that sounds that are physically similar might be perceptually different and opposite.
  • Pentionen from Helsinki University of Technology showed physical models of how the plucking point on a guitar effects the sound. He also had some PD-patches (PULMU) that could find the plucking point from audio.
  • Gallese from Parma presented his work on mirror neurons: action = representation of the action. This was shown in his now famous experiments with a monkey that has similar neural activity when grasping for a nut and when watching someone else grasp for the nut. He said he thinks cognitive neuroscientists should read more philosophy -> Husserl: we perceive the world also through our memory.
  • In the TAI-CHI project they are trying to make “tangible acoustic interfaces” for human computer interaction, by attaching contact microphones to a surface, and calculating the position of touch from the time difference. Different setups were displayed, and they all worked more or less ok, although they are currently limited to tapping-only, not continuous movement or multi-finger interfaces.
  • Trucco from Heriot-Watt University gave a very compact introduction to current trends in videoconferencing, and how they create a “natural” meeting of three people using a multicamera setup. This involves morphing several live video streams giving the impression that a person is looking at you, rather than into a camera. This seems to be large field of research! From a sound perspective I find it strange that not more effort is put into sound localization, for example giving the impression that the sound is coming from the mouth of the speaker rather than directly from a speaker.

Gesture Workshop

Gesture Workshop, Vannes, France, 18-20 May 2005

This inter-disciplinary conference gathered a number of different people working with gestures, covering subjects such as sign language , linguistics, behavioural psychologists, sports movement, human computer interaction and musical interaction. We presented a paper on "air playing", where Rolf Inge Godoy gave a theoretical introduction and described our observational studies and I presented the Musical Gestures Toolbox.

It was interesting to get an overview of what is going on in the field of sign language and gesture linguistics. Many groups are working on different ways of creating sign language directly from speech, while others try to analyse sign language and generate text or speech. The latter is more interesting for my project, I think, since it involves computer vision and recognition.

Things I found interesting

  • Martin Kaltenbrunner talked about his music table project, which I hope to see in Barcelona this fall. He also had ported PDa to a 2G iPod!
  • Frédéric Bevilacqua showed IRCAM’s new wireless ethersense, a very nice device communicating with regular wifi or USB. Too bad it costs around €800+tax. He also showed a betaversion of MNM (Music is not mapping) using HMMs for gesture recognition, and the FTM for matrix operations in Max. Seems very promising!
  • University of Aachen was represented with three PhD-students showing different types of video analysis. They have made available a database of sign language videos called BOSTON 50 and BOSTON 201. Morteza Zahedi talked about density thresholding and tangent distances for computer vision.
  • José Miguel Salles Dias showed a nice system for displaing hand trajectories using video analysis.
  • Anne Marie Burns presented her finger tracking system developed with EyesWeb.
  • Xavier Rodet presented the IRCAM Phase project, which is controlled by a haptic arm. It has been used for installations, but it would be very interesting to explore this as a musical instrument in performance.
  • Thomas Moeslund showed some computer vision work, and a trick of turning motion vectors into 4D Euler space. I didn’t really understand how this works, but is seems smart.
  • Nicolas Rasamimanana presented work on the IRCAM augmented violin, something similar to the hyperinstruments by Joe Paradiso and Diana Young, and showed graphs of clustering of different types of bowing strokes.
  • Ginevra Castellano presented her Master’s project on studying emotional response to music by analysing people’s movement of a laser pointer in 2D. She presented some results based on static analysis of the material, and I am looking forward to seeing the results from the functional analysis that she is currently working on.
  • They are doing a lot of interesting things at Helsinki University of Technology, virtual snowfighting, swimming, etc.
  • Irene Kambara from the McNeill lab showed some interesting conversation studies.
  • Kristoffer Jensen showed a system for controlling additive synthesis with a laser pointer.