In the second compulsory assignment in MUS2640 Sensing Sound and music, the students were tasked with a foundational exercise in audio analysis: creating spectrograms. Here I summarize how it went.

The Assignment

The core task was straightforward:

  1. Choose two distinct sound recordings. Students selected everything from musical instrument sounds and songs to environmental sounds like thunderstorms, boiling water, and animal calls.
  2. Generate visualizations in Sonic Visualiser. For each sound, they created a waveform (showing amplitude over time) and two types of spectrograms (showing frequency content over time).
  3. Compare linear and logarithmic scales. One spectrogram had to use a linear frequency axis and the other a logarithmic one.
  4. Write a reflection. Students described what they did and what they learned from comparing the sounds and the different visualizations.

They were asked to do the exercise in the excellent open-source program Sonic Visualiser, but for those who wanted to do it in Python or MATLAB, that was fine.

General observations

Many students quickly grasped the basics, including correctly identifying that tonal sounds appear as horizontal lines in a spectrogram, representing the fundamental frequency and its overtones. In contrast, percussive or noisy sounds appear as vertical stripes, indicating energy spread across a wide frequency range at a single moment.

Most students understood that linear spectrograms allocate equal space to all frequencies, which often makes them better for seeing detail in high-frequency content. However, logarithmic spectrograms are generally better for musical sounds. It resembles human hearing, and by giving more visual space to lower frequencies, this view better highlights bass tones, fundamental pitches, and harmonic structures.

Explanation of spectrograms

Challenges

While many succeeded, the assignment also highlighted several common challenges:

  1. Navigating New Software and Terminology: For students new to audio analysis, the initial hurdle was learning the software and its terminology. Some reported struggles with finding the correct settings or understanding what they were looking at. As one student honestly reflected, “I was not certain how to properly analyse my visuals.” This is a reminder that technical fluency comes with practice.

  2. Moving from Description to Interpretation: A common challenge is leaping describing what is visible to interpreting why it looks that way. Many students could say, “The kick drum is a red spike at the bottom,” but fewer went on to explain that this shows a transient event with intense energy concentrated in the low-frequency domain. As one student astutely pointed out, a spectrogram requires interpretation based on prior understanding (e.g., horizontal lines indicate tones, vertical lines indicate transients).

  3. Understanding Analysis Parameters: The assignment bonus invited students to explore other settings, but few delved into parameters like FFT window size. By experimenting with window sizes, they discovered the fundamental time-frequency trade-off: A short window (e.g., 1024 samples) provides excellent time resolution, making it easy to pinpoint the exact moment of a drum hit. A long window (e.g., 32768 samples) provides excellent frequency resolution, allowing individual overtones to appear as sharp, distinct lines.

Final Reflections

Students are exposed to spectrograms in several settings, but few really understand what they mean. That is why I push them to create some themselves and reflect on what the result can tell about the sound. Ultimately, they need to develop a sense of how to connect sound visualizations to what they hear.


Thanks to NotebookLM for summarizing findings and creating a nice visualization.