I have been discussing definitions of the terms motion/movement, action and gesture several times before on this blog (for example here and here). Here is a summary of my current take on these three concepts:
Motion: displacement of an object in space over time. This object could be a hand, a foot, a mobile phone, a rod, whatever. Motion is an objective entity, and can be recorded with a motion capture system. A motion capture system could be anything from a simple slider (1-dimensional), to a mouse (2-dimensional), to a camera-based tracking system ((3-dimensional) or an inertial system (6-dimensional: 3D position and 3D orientation). I have previously also discussed the difference between motion and movement. Since motion is a continuous phenomenon, it does not make sense to talk about it in plural form: “motions”. Then it makes more sense to talk about one or more motion sequences, but most probably it makes even more sense to talk about individual actions.
Action: a goal-directed motion (or force) sequence, for example picking up a stone from the ground, playing a piano tone. Actions may have a clear beginning and end, but they may also overlap due to coarticulation, such as when playing a series of tones on the piano. This uncertainty as to how actions should be segmented (or chunked), is what makes them subjective entities. As such, I do not think it is possible to measure an action directly, since there is no objective measure for when an action begins or ends, or how it is organised in relation to other actions. But, based on knowledge about human cognition, it is possible to create systems that can estimate various action features based on measurements of motion.
Gesture: the meaning being expressed through an action or motion. A gesture is not the same as action or motion, although it is related to both of them. As such, a gesture can be seen as a semiotic sign, in which the meaning is conveyed through an action, but it is highly subjective and dependent on the cultural context in which the action is carried out. Also, the same meaning can be conveyed through different types of physical actions. For example, the meaning you convey when you wave “good-bye” to someone may be independent of whether you do it with the left or the right arm, the size of the action, etc.
Unfortunately, with the popularity of motion and gesture studies over the last years, I see that many people use the term gesture more or less synonymously to action or motion. This is particularly the case in the field of “gesture recognition” in various versions of human-computer interaction (HCI). I think it is unfortunate because we loose the precision with which we can describe the three different phenomena. If we track continuous motion in time and space, it is “motion tracking”. If we aim at recognising certain physical patterns in time and space, I would call it “action recognition” unless we are looking for some meanings attached to the actions. “Gesture recognition” I would only use if we actually recognise the meaning attached to some actions or motion. An example here would be to recognise the emotional quality of the performance of a violinist. That, however, is something very different than tracking the bowing style.