Ever since I started my PhD project I have been struggling with the word gesture. Now as I am working on a theory chapter for my dissertation, I have had to really try and decide on some terminology, and this is my current approach:

I use movement as the general term to describe the act of changing physical position of body parts related to music performance or perception. Action is used to denote goal-directed movements that form a separate unit. This involves perceptual chunking on either the performers and/or the perceiver’s side.

Thus, what I am trying to say, is that gesture is really a mental construct happening on the perceiver’s side (yes, I agree that sometimes the performer could also be the perceiver). A sketch of this could be:

performer-perceiver.png {width=“307” height=“105”}

The performer is trying to communicate an intention to the perceiver, but is doing this through an action. How the action is interpreted is a subjective process happening on the perceiver’s side, and sometimes the performer’s intention is perceived “correctly”, sometimes not.

I think many of the so-called “gesture recognition” systems are really focusing on low-level features, which could (at best) be described as “action recognition” systems. But if we want to make systems that actually look for gestures, we need to focus on the interpretation of the intentions behind the actions.