In my ever-growing collection of FFmpeg-related blog posts, I will today show how to add subtitles to videos. These tricks are based on the need to create a captioned version of a video I made to introduce the Workshop on NIME Archiving for the 2022 edition of the International Conference on New Interfaces for Musical Expression (NIME). This is the video I discuss in this blog post:
Note that YouTube supports turning on and off the subtitles (CC button).
Why add subtitles to videos?
I didn’t think much about subtitles previously but have become increasingly aware of their importance. Firstly, adding subtitles is essential from an accessibility point of view. In fact, at UiO, it is now mandatory to add subtitles to all videos we upload to the university web pages. The main reason is that people that have problems hearing the content can read.
Also, for people that can hear sound in the video, it is helpful to have subtitles available. On Twitter, for example, videos will play automatically when you hover over them. However, the sound will usually be off, so without subtitles, it is impossible to hear what is said. There are also times when you may want to only watch some content without listening, for example, if you don’t have headphones available in a public setting.
I guess that having subtitles also helps search engines find your content more efficiently, which may lead to better dissemination of the content.
There are numerous ways of doing this, but I usually rely on some machine listening service as the first step these days. At UiO, we have a service called Autotekst that will create a subtitle text file from an audio recording. The nice thing about that service is that it supports both English and Norwegian and two people talking. It is pretty ok but does require some manual cleanup. I typically do that in a text editor while checking the video.
YouTube offers a more streamlined approach for videos uploaded to their service. It has machine listening built in that works quite well for one person talking in English. It also has a friendly GUI where you can go through and check the text and align it with the audio.
I typically use YouTube for all my public videos in English and UiO’s service for material in Norwegian and with multiple people.
Closed Caption formats
Various services and platforms support at least 25 different subtitle formats. I have found that SRT (SubRip) and VTT (Web Video Text Tracks) are the two most common formats. Both are supported by YouTube, while Twitter prefers SRT (although I still haven’t been able to upload one such file successfully). The video player on UiO prefers VTT, so I often have to convert between these two formats.
Fortunately, FFmpeg comes to the rescue (again). Converting from one subtitle format to another is as simple as writing this one-liner:
ffmpeg -i caption.srt caption.vtt
This will convert an SRT file to VTT in a split second. As the screenshot below shows, there is not much difference between the two formats.
Playing back videos with subtitles
Web players will play subtitles associated with them, but what about on a local machine. If the subtitle file is named the same as the video file, most video players will use the subtitles when playing the file. Here is how it looks in VLC on Ubuntu:
It is also possible to go into the settings to turn the subtitles on and off. I guess it is also possible to have multiple files available to add additional language support, and that would be interesting to explore another time.
Embedding subtitles in video files
We are exploring using PubPub for the NIME conference, a modern publication platform developed by The MIT Press. There are many good things to say about PubPub, but some features are still missing. Adding a subtitle file to uploaded videos is one missing feature. I, therefore, started exploring whether it is possible to embed the subtitles inside the video file.
A video file is built as a “container” that holds different content, of which video and audio are two (or more) “tracks” within the file. The nice thing about working with FFmpeg is that one quickly understands how such containers are constructed. And, just as I expected, it is possible to embed an SRT file (and probably others too) inside of a video container.
As discussed in this thread, many things can go wrong when you try to do this. I ended up with this one-liner:
The trick is to think of the subtitle file as just another input file that should be added to the container. The result is a video file with subtitles built in, as shown below:
It may have been a long shot, but the PubPub player didn’t support such embedded subtitles either. Then I started exploring a more old-school approach, “burning” the text into the video file. I feared that I had to do this within a video editor, but, again, it turned out that FFmpeg could do the trick:
This “burns” the text into the video, which is not the best way of doing it, I think. After all, the nice thing about having the subtitles in text form is that they can be turned on and off and adjusted in size. Still, having some subtitles may be better than nothing.
Posting on Twitter
After having gone through all that trouble, I wanted to post the video on Twitter. This turned out to be more difficult than expected. Three problems arose.
First, Twitter does not support 4K videos, so I had to downsample to Full HD. Fair enough, that is easily done with FFmpeg. Second, Twitter only supports videos shorter than 2:20 minutes; mine is 2:34. Fortunately, I could easily cut out the video’s first and last sentence, and it still made sense. However, this also leads to trouble with the subtitles. The subtitles are based on the timing of the original video. So if I were to trim the video, I would also need to edit the subtitle file to adjust all the timings (Happy to get input on tools for doing that!).
After spending too much time on this project, I reverted to the “burned” text approach. Writing the text into the video and trimming it would ensure some text together with the video. While preparing the one-liner, I wondered whether FFmpeg would be smart enough to also “trim” the subtitles when trimming the video file:
The command above does it all one go: downscale from 4K to HD, add the subtitles, and trim the video to the desired duration. Unfortunately, this command kept text from the sentence that was trimmed out in the beginning:
The extra words at the beginning of the video are perhaps not the biggest problem. I would still be interested to hear thoughts on how to avoid this in the future. After all, subtitles are here to stay.