I have previously written about the differences between sound and audio and sound/light vs audio/video. In this short blog post, I problematize the concept “audiovisual”.

Dictionary definitions

The term “audiovisual” is ubiquitous. Unfortunately, for those of us working both on technology and psychology, it causes a lot of confusion. For example, consider the definition by Wikipedia:

Audiovisual (AV) is electronic media possessing both a sound and a visual component, such as slide-tape presentations, films, television programs, corporate conferencing, church services, and live theater productions.

This definition solely focuses on media. On the other hand, Cambridge English Dictionary defines audiovisual as

something that involves seeing and hearing

This definition thus relates to human perception. And that is the problem when you work between technology and psychology, you have no idea what the term means.

Adding to the confusion is that “audiovisual” has become popular among computer scientists working with deep learning models that use both audio and video as inputs, while applying principles inspired by human perception and cognition. This causes yet another level of confusion.

A practical solution

As always, it helps to define how one uses a term. But in this case, I think it is better to scrap the term “audiovisual” altogether and instead specify what you mean:

  • Audio–video is a good concept explaining that you are dealing with the combination of audio and video media, that is, the representation of sound and light in analog or electronic form.

  • Auditory–visual clearly describes that you are dealing with (human) perception, from sensing to cognition.

Note that I use an en-dash between the word pairs above (not a hyphen) to explain their relationship.

Note on the use of AI: I used Grammarly to spellcheck this post