Ever since I started my PhD project I have been struggling with the word gesture. Now as I am working on a theory chapter for my dissertation, I have had to really try and decide on some terminology, and this is my current approach:

I use movement as the general term to describe the act of changing physical position of body parts related to music performance or perception. Action is used to denote goal-directed movements that form a separate unit. This involves perceptual chunking on either the performers and/or the perceiver’s side.

Thus, what I am trying to say, is that gesture is really a mental construct happening on the perceiver’s side (yes, I agree that sometimes the performer could also be the perceiver). A sketch of this could be:


The performer is trying to communicate an intention to the perceiver, but is doing this through an action. How the action is interpreted is a subjective process happening on the perceiver’s side, and sometimes the performer’s intention is perceived “correctly”, sometimes not.

I think many of the so-called “gesture recognition” systems are really focusing on low-level features, which could (at best) be described as “action recognition” systems. But if we want to make systems that actually look for gestures, we need to focus on the interpretation of the intentions behind the actions.

  1. Well, we are doing a project on gesture recognition and I fully agree. Currently no-one is trying to filter out the gestures from the other movements. The point is that no-one knows how. I ran a perception experiment on sign detection with fidgeting movements as distractors. People were great at picking out signs (even if they knew no sign language). I repeated it with gestures and signs: still good. But I didn’y find out yet how people exactly do this trick. But nice to see I am not alone in thinking about such things.

