An object-action-context approach to writing alt text

I came across an interesting blog post by Alex Chen on how to write better image descriptions for web pages. They propose an “object-action-context” approach when writing image descriptions. I see that such an approach could also be helpful for my sound actions project.

Adding better descriptions

I am soon getting to the end of my year-long project of recording one sound action daily. A sound action is a multimodal entity consisting of body motion and its resultant sound. My starting point is that when we only see a sound action, we can imagine its sound. If we only hear a sound action, we can imagine the body motion and objects involved in the interaction. Here is the playlist with all the recorded sound actions:

When my data collection ends, I will focus on analyzing all the recordings. But first, I need to pre-process the data and add better descriptions. My current titles have been written spontaneously, and I have focused on including words describing the main sound-producing object in the recording. For many, I have also included verbs to explain the actions. But scrolling through all the recordings, I see that I haven’t been consistent.

The importance of the context

After reading Chen’s blog post, I realize that I should have also focused more on describing context. It matters whether the recording is done in my kitchen, living room, office, or elsewhere. The different acoustic features of the rooms within which I have recorded heavily influence the final sound, and the light and visual framing change what one can see. Initially, I didn’t think so much about this, but my focus has shifted throughout the year. The project began to test ideas from my Sound Actions book. There I primarily investigate intentional sound-producing actions. Thus my recordings have focused on capturing “foreground” activities. In fact, I have deliberately tried to “remove” information about the environment.

My new project AMBIENT focuses specifically on the environment, the “background” of our lives. This is what Chen calls the “context” in their model. However, to be consistent with the terminology I have developed for my project, I will stick to “environment”. Thus to describe any sound action, one could talk about the following:

object: the sound-producing object(s) involved in the interaction. Sometimes it is one object; other times, two or more equal objects (two stones hitting each other); and other times, a tool interacting with an object (like a drumstick).
action: the (inter)action that leads to sound production with the object(s)
environment: the surroundings that the action occurs in, such as a room.

This could be illustrated as follows:

Illustration of action, object, and environment

Moving on, I will rename my recordings with this system in mind. As always, it will be interesting to see if it works in practice. This year, I have learned that it is more complicated than I had thought to identify, record, and analyze sound actions.

Adding better descriptions#

The importance of the context#

Adding better descriptions

The importance of the context