Yesterday, I wrote about some reflections I had during Olgerta Asko’s PhD defence. Today, while chopping up the video recording to put on the RITMO web page, I thought that it might help to use a videogram to assist with segmentation.
Videograms
A videogram is similar to a motiongram, the main difference being that the videogram uses the regular video image as input to the “compression” instead of a motion video. Both give an impression of what is in a video file over time. We have functions for creating both videograms and motiongrams in the Musical Gestures Toolbox, but they are optimised for using other functions in the toolbox.
With the help of CoPilot, I created a standalone Python script to generate videograms from a folder of video files. While at it, I came to think that having a waveform available for “multimodal” inspection of the content. However, that requires aligning the timelines, which is non-trivial when working with two different data types (audio and video). In theory, it is easy, but my rusty programming skills have left me struggling with the nitty-gritty details of implementation. Fortunately, with some iterations with CoPilot, we made it work.
Features
The script is set up with the following features:
- Batch processing of multiple video formats (mp4, mov, avi, mkv, flv, wmv, webm, m4v)
- Automatic frame skip calculation based on target videogram width
- GPU acceleration support (CUDA, VAAPI, QSV)
- Automatic output path conflict resolution
- HH:MM:SS time formatting on both panels
- Handles videos with or without audio tracks
- Customisable waveform sampling and videogram constraints
Here are some examples of how it can be used:
python3 video_timeline_plot.py ./videos –output ./plots python3 video_timeline_plot.py video.mp4 –output timeline.png –skip 5 python3 video_timeline_plot.py ./videos -o ./plots –use-gpu –max-width 2000
Examples
Here are some examples based on four videos of Olga’s defence, the trial lecture, introduction, and the first and second oppositions.
The waveform displays are not particularly helpful here, since there was talking throughout. Then, the videograms are better at revealing changes in the slides (in the first two examples) and in camera positions (in the last two examples). The visualisations are abstract, yet they can help in getting a sense of what happened.
Conclusion
My new script is in some ways more limited than the general-purpose functions available in MGT. However, since I didn’t have to worry about integrating with everything else in the toolbox, I could easily implement frame skipping, GPU optimisation, and better file handling. This makes it possible to integrate the script into a video processing pipeline that I hope to deploy on a server soon, rather than running it locally on my computer.
Thanks to CoPilot for assisting with making the script and Grammarly for checking the language.




