S2S^2 Summer School , Genova 25-29 July 2005

This PhD summer school, an initiative of the S2S^2 inititative (Sound to Sense, Sense to Sound) was not much of a school, but rather a normal conference with lots of presentations of current research in the related fields. Although I knew quite a lot of the work beforehand, it was anyway a good brush up on a number of different subjects.

Things I found interesting

  • Pedro Rebelo from SARC, Belfast, talked about prosthetic instruments, and how instruments should be specific and non-generic to work well.

  • Roberto Bresin and Anders Friberg from KTH showed their conductor system using emotional descriptors such as sad-tender and angry-happy (from Gabrielson and Juslin studies on valence and energy in music).

  • Marc Leman from IPEM, Ghent, gave a nice historical overview of systematic musicology.

    • He also explained how he has shifted his attention from purely cognitive studies to embodied cognition.
    • Concerning subjectivity versus objectivity in the study of music, he explained how we do tend to agree on a lot of things, for example “subjective” things like similarity and musical genre, thus it should be possible to study them in an objective way.
    • A new audio library for EyesWeb is coming soon
  • Gerhard Widmer from OFAI, Vienna commented that a problem in AI and music is that too many have been working on making computers do tasks that are neither particularly useful nor interesting, such as Bach harmonization and Schenker analysis, all based on symbolic notation.

  • I always like to hear what Francois Pachet from the Sony-lab in Paris is up to. The lecture was, unfortunately, too general rather than giving time to go into details:

    • He gave an overview of his previous work with the improvisation tool Continuator
    • In his more recent work in music information retrieval, he suggested to use Google for finding “sociological” measures of artist similarity. This is done by querying google for combinations of artists, for example “Mozart + Beatles” will give some meta-measure of the similarity between the two. Doing this for a number of artists and making a confusion matrix, can be used to tell the similarity or difference. A rather crude and non-musical thing to do, but it seemed to work well in the examples he showed. Having some sort of non-musical approach to the topic can, indeed, reveal interesting things that working with only the audio streams might not. Such a “sociological” approach reminded me of an MIT-paper from ICMC 2002 where they looked at Napster traffic to find what people were querying for and downloading.
    • He also presented musaicing, or how to create mosaics of music, and played some interesting examples.
    • Another interesting approach was to use evolutionary algorithms to generate new timbre analysis functions in a lisp-like language, and then evaluate the output to see which worked better.
    • On a comment that similarity is an ill-defined word he commented “well-defined problems are for engineers”.
  • Tony Myatt from MRC at York was focusing on what he considered problems in the music technology field:

    • He said that a problem in the field is that a lot of the technologies developed never get to mature since they are only used once in a concert and then forgotten.
    • He also commented on the post-digital composers (referring to Kim Cascone’s CMJ 2000 paper on the “Aesthetics of failure”) working outside of the academic institutions.
    • A problem is that the community has not been good enough at communicating the values of music technology. We lack a common music-tech vision. He showed an example from an astronomy journal where ESA clearly defines key research areas for the coming years.
    • His visions for the future includes:
      • Better integration of musicians/composers in the field
      • Coordination of other disciplines: brain science, psychology, engineering
      • New aesthetic approaches
      • Better communication
  • The Pompeu Fabra group gave a general overview of content-based audio processing, focusing on the four axis of timbre, melody/harmony, rhythm and structure.

    • Showed a number of impressive demos of realtime processing of sound:
      • Time scaling
      • Rhythm transformation, swing to non-swing etc
      • Change voice from male to female
      • Harmonizer
      • Single voice makes a full choir (with time deviations!)
      • Modify a single object (for example voice) in a polyphonic stream
  • Erkut from Helsinki Technical University outlined the different processes which might fall in the group called “physics-based sounds synthesis”.

    • The essential part of physical modelling is the bi-directionality of the blocks
  • Alain de Cheveigné from IRCAM warned about not forgetting everything happening between the cochlea and the cortex

  • Stephen McAdams from McGill gave a very brief overview of his research:

    • Timbre can be described by attack time, spectral centroid and spectral deviation
    • He proposed psychomechanics as a term to describe quantative links between mechanical properties of objects and their auditory results - mechanics, perception, acoustics
    • An important part for the future is to figure out a number of minimum parameters necessary for studying sound
  • Mark Sandler from Queen Mary, London, was one of few speakers that had a somewhat more critical presentation, attacking some general trends in the MIR community:

    • Difference between “everyday” and “musical” listening
    • The community has been focusing on computer listening. There should be more subjective testing on representative populations
    • Should not only use compressed sound (MP3) as raw material for analysis:
    • Removal of sound in the higher and lower parts of the spectrum
    • Reduction of the stereo field by combining L&R channels
    • This gives polluted source files -> inconclusive science
    • MP3 encoders work differently, so if you don’t know which encoder was used, you don’t know what you have
    • Similarity algorithms work better on uncompressed soun
  • Francois Pachetcommented that we should continue to study compressed sound, since people actually listen to that. If the algorithms don’t work well on compressed sound, it is a problem with the algorithm not people.

  • Someone (forgot to write down who…) presented a new Audition library toolbox for PD which seemed interesting

  • Maria Chait from Maryland, US and ENS, Paris said it is important to remember that sounds that are physically similar might be perceptually different and opposite.

  • Pentionen from Helsinki University of Technology showed physical models of how the plucking point on a guitar effects the sound. He also had some PD-patches (PULMU) that could find the plucking point from audio.

  • Gallese from Parma presented his work on mirror neurons: action = representation of the action. This was shown in his now famous experiments with a monkey that has similar neural activity when grasping for a nut and when watching someone else grasp for the nut. He said he thinks cognitive neuroscientists should read more philosophy -> Husserl: we perceive the world also through our memory.

  • In the TAI-CHI project they are trying to make “tangible acoustic interfaces” for human computer interaction, by attaching contact microphones to a surface, and calculating the position of touch from the time difference. Different setups were displayed, and they all worked more or less ok, although they are currently limited to tapping-only, not continuous movement or multi-finger interfaces.

  • Trucco from Heriot-Watt University gave a very compact introduction to current trends in videoconferencing, and how they create a “natural” meeting of three people using a multicamera setup. This involves morphing several live video streams giving the impression that a person is looking at you, rather than into a camera. This seems to be large field of research! From a sound perspective I find it strange that not more effort is put into sound localization, for example giving the impression that the sound is coming from the mouth of the speaker rather than directly from a speaker.