Adding subtitles to videos

In my ever-growing collection of FFmpeg-related blog posts, I will today show how to add subtitles to videos. These tricks are based on the need to create a captioned version of a video I made to introduce the Workshop on NIME Archiving for the 2022 edition of the International Conference on New Interfaces for Musical Expression (NIME).

The video I discuss in this blog post. YouTube supports turning on and off the subtitles (CC button).

Why add subtitles to videos?

I didn’t think much about subtitles previously but have become increasingly aware of their importance. Firstly, adding subtitles is essential from an accessibility point of view. In fact, at UiO, it is now mandatory to add subtitles to all videos we upload to the university web pages. The main reason is that people that have problems hearing the content can read.

Also, for people that can hear sound in the video, it is helpful to have subtitles available. On Twitter, for example, videos will play automatically when you hover over them. However, the sound will usually be off, so without subtitles, it is impossible to hear what is said. There are also times when you may want to only watch some content without listening, for example, if you don’t have headphones available in a public setting.

I guess that having subtitles also helps search engines find your content more efficiently, which may lead to better dissemination of the content.

Creating subtitles

There are numerous ways of doing this, but I usually rely on some machine listening service as the first step these days. At UiO, we have a service called Autotekst that will create a subtitle text file from an audio recording. The nice thing about that service is that it supports both English and Norwegian and two people talking. It is pretty ok but does require some manual cleanup. I typically do that in a text editor while checking the video.

YouTube offers a more streamlined approach for videos uploaded to their service. It has machine listening built in that works quite well for one person talking in English. It also has a friendly GUI where you can go through and check the text and align it with the audio.

YouTube has a nice window for adding/editing subtitles.

I typically use YouTube for all my public videos in English and UiO’s service for material in Norwegian and with multiple people.

Closed Caption formats

Various services and platforms support at least 25 different subtitle formats. I have found that SRT (SubRip) and VTT (Web Video Text Tracks) are the two most common formats. Both are supported by YouTube, while Twitter prefers SRT (although I still haven’t been able to upload one such file successfully). The video player on UiO prefers VTT, so I often have to convert between these two formats.

Fortunately, FFmpeg comes to the rescue (again). Converting from one subtitle format to another is as simple as writing this one-liner:

ffmpeg -i caption.srt caption.vtt

This will convert an SRT file to VTT in a split second. As the screenshot below shows, there is not much difference between the two formats.

A few lines of the same subtitle file in VTT format (left) and SRT (right).

Playing back videos with subtitles

Web players will play subtitles associated with them, but what about on a local machine. If the subtitle file is named the same as the video file, most video players will use the subtitles when playing the file. Here is how it looks in VLC on Ubuntu:

VLC will show the subtitles automatically if it finds an SRT file with the same name as a video file.

It is also possible to go into the settings to turn the subtitles on and off. I guess it is also possible to have multiple files available to add additional language support, and that would be interesting to explore another time.

The benefit of having subtitles as a separate text file is that they can be turned on and off.

Embedding subtitles in video files

We are exploring using PubPub for the NIME conference, a modern publication platform developed by The MIT Press. There are many good things to say about PubPub, but some features are still missing. Adding a subtitle file to uploaded videos is one missing feature. I, therefore, started exploring whether it is possible to embed the subtitles inside the video file.

A video file is built as a “container” that holds different content, of which video and audio are two (or more) “tracks” within the file. The nice thing about working with FFmpeg is that one quickly understands how such containers are constructed. And, just as I expected, it is possible to embed an SRT file (and probably others too) inside of a video container.

As discussed in this thread, many things can go wrong when you try to do this. I ended up with this one-liner:

ffmpeg -i video.mp4 -i captions.srt -c copy -c:s mov_text video_cc.mp4

The trick is to think of the subtitle file as just another input file that should be added to the container. The result is a video file with subtitles built in, as shown below:

It may have been a long shot, but the PubPub player didn’t support such embedded subtitles either. Then I started exploring a more old-school approach, “burning” the text into the video file. I feared that I had to do this within a video editor, but, again, it turned out that FFmpeg could do the trick:

ffmpeg -i video.mp4 -vf subtitles=captions.srt video_cc.mp4

This “burns” the text into the video, which is not the best way of doing it, I think. After all, the nice thing about having the subtitles in text form is that they can be turned on and off and adjusted in size. Still, having some subtitles may be better than nothing.

The video with the subtitle text “burned” into the video content.
The video with subtitles is in a separate text layer.
The video is embedded on the workshop page, with subtitles.

Posting on Twitter

After having gone through all that trouble, I wanted to post the video on Twitter. This turned out to be more difficult than expected. Three problems arose.

First, Twitter does not support 4K videos, so I had to downsample to Full HD. Fair enough, that is easily done with FFmpeg. Second, Twitter only supports videos shorter than 2:20 minutes; mine is 2:34. Fortunately, I could easily cut out the video’s first and last sentence, and it still made sense. However, this also leads to trouble with the subtitles. The subtitles are based on the timing of the original video. So if I were to trim the video, I would also need to edit the subtitle file to adjust all the timings (Happy to get input on tools for doing that!).

After spending too much time on this project, I reverted to the “burned” text approach. Writing the text into the video and trimming it would ensure some text together with the video. While preparing the one-liner, I wondered whether FFmpeg would be smart enough to also “trim” the subtitles when trimming the video file:

ffmpeg -i video.mp4 -vf subtitles=captions.srt,scale=1920:1080,fps=30 -ss 00:00:11 -t 00:02:17  video_hd_cc.mp4

The command above does it all one go: downscale from 4K to HD, add the subtitles, and trim the video to the desired duration. Unfortunately, this command kept text from the sentence that was trimmed out in the beginning:

When trimming a captioned video, you get some text that is not part of the video.

The extra words at the beginning of the video are perhaps not the biggest problem. I would still be interested to hear thoughts on how to avoid this in the future. After all, subtitles are here to stay.

New publication: NIME and the Environment

This week I presented the paper NIME and the Environment: Toward a More Sustainable NIME Practice at the International Conference on New Interfaces for Musical Expression (NIME) in Shanghai/online with Raul Masu, Adam Pultz Melbye, and John Sullivan. Below is our 3-minute video summary of the paper.

And here is the abstract:

This paper addresses environmental issues around NIME research and practice. We discuss the formulation of an environmental statement for the conference as well as the initiation of a NIME Eco Wiki containing information on environmental concerns related to the creation of new musical instruments. We outline a number of these concerns and, by systematically reviewing the proceedings of all previous NIME conferences, identify a general lack of reflection on the environmental impact of the research undertaken. Finally, we propose a framework for addressing the making, testing, using, and disposal of NIMEs in the hope that sustainability may become a central concern to researchers.

Paper highlights

Our review of the NIME archive showed that only 12 out of 1867 NIME papers have explicitly mentioned environmental topics. This is remarkably low and calls for action.

My co-authors have launched the NIME eco wiki as a source of knowledge for the community. It is still quite empty, so we call for the community to help develop it further.

In our paper, we also present an environmental cost framework. The idea is that this matrix can be used as a tool to reflect on the resources used at various stages in the research process.

Our proposed NIME environmental cost framework.

The framework was first put into use during the workshop NIME Eco Wiki – a crash course on Monday. In the workshop, participants filled out a matrix each for one of their NIMEs. Even though the framework is a crude representation of a complex reality, many people commented that it was a useful starting point for reflection.

Hopefully, our paper can raise awareness about environmental topics and lead to a lasting change in the NIME community.

Strings On-Line installation

We presented the installation Strings On-Line at NIME 2020. It was supposed to be a physical installation at the conference to be held in Birmingham, UK.

Due to the corona crisis, the conference went online, and we decided to redesign the proposed physical installation into an online installation instead. The installation ran continuously from 21-25 July last year, and hundreds of people “came by” to interact with it.

I finally got around to edit a short (1-minute) video promo of the installation:

I have also made a short (10-minute) “behind the scenes” mini-documentary about the installation. Here researchers from RITMO, University of Oslo, talk about the setup featuring 6 self-playing guitars, 3 remote-controlled robots, and a 24/7 high-quality, low-latency, audiovisual stream.

We are planning a new installation for the RPPW conference this year. So if you are interested in exploring such an online installation live, please stay tuned.

How long is a NIME paper?

Several people have argued that we should change from having a page limit (2/4/6 pages) for NIME paper submissions to a word limit instead. It has also been argued that references should not be counted as part of the text. However, what should the word limits be?

It is always good to look at the history, so I decided to check how long previous NIME papers have been. I started by exporting the text from all of the PDF files with the pdftotext command-line utility:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; pdftotext "$i" "${name}.txt"; done

Then I did a word count on these:

wc -w *.txt > wc.txt

And after a little bit of reformatting and sorting, this ends up like this in a spreadsheet format:

And from this we can sort and make a graphical representation of the number of words:

There are some outliers here. A couple of papers are (much) longer than the others, mainly because they contain long appendices. Some files have low word count numbers because the PDF files are protected from editing, and then pdftotext is not able to extract the text. The majority of files, however, are in the range 2500-5000 words.

The word count includes everything, also headers/footers, titles, abstracts, acknowledgements, and references. These differ, but the total words used for these things are 2000-5000 words. So the main text of most papers could be said to be in the range of 2000-4500 words.

Improving the PDF files in the NIME archive

This blog post summarizes my experimentation with improving the quality of the PDF files in the proceedings of the annual International Conference on New Interfaces for Musical Expression (NIME).

Centralized archive

We have, over the last few years, worked hard on getting the NIME adequately archived. Previously, the files were scattered on each year’s conference web site. The first step was to create a central archive on nime.org. The list there is automagically generated from a collection of publicly available BibTeX files that serve as the master document of the proceedings archive. The fact that the metadata is openly available in GitHub makes it possible for people to fix errors in the database. Yes, there are errors here and there, because the files were made by “scraping” the PDF content. It is not just possible to do this manually for more than 1000 PDF files.

The archive points to all the PDF files, some media files (more are coming), and DOIs to archived PDFs in Zenodo. Together, this has turned out to be a stable and (we believe) future-proof solution.

PDF problems

However, as it has turned out, the PDF files in the archive have various issues. All of them work fine in regular PDF readers, but many of them have accessibility issues. There are (at least) three problems with this.

  1. Non-accessible PDFs do not work well for people using alternative readers. We need to strive for universal access at NIME, and this includes the archive.
  2. The files are not optimized for text mining tasks. The latter is something that more people have been interested in. Such an extensive collection of files is a great resource when it comes to understanding a community and how it has developed. This was something I tried myself in a NIME paper in which I analyzed the use of the word “gesture” in all NIME papers up until 2013.
  3. If machines have problems with the files, so have the Google crawlers and other robots looking at the content of the files. This, again, has implications for how the files can be read and indexed in various academic databases.

It is not strange that there are issues with the files. After all, there are a total of 1728 of them. They have been produced from 2001 and until today on a myriad of different types of OSes and software. During this time, the PDF standard itself has also evolved considerably. For that reason, we have found it necessary to do some optimization of the files.

Renaming

The first thing I did was to download the entire collection of PDFs. I quickly discovered that there were some inconsistencies in the file names. We did a large cleanup of the file names some years ago, so things were not entirely bad. But it was still necessary to clean up the file names to have one convention. Here I ended up with renaming everything to a pattern like:

nime2001_paper001.pdf

This makes it possible to sort by year first, and then the submission type (currently only paper and music, but could be more) and then a three-digit unique number based on the submission number. Not all the numbers had leading 0’s, so I added this as well for consistency. Since the conference year and ID are unique, it is easy to do query-replace in the BibTeX database to correct the links there.

Acrobat testing

I usually don’t work much in Acrobat these days, but decided to start my testing there. I was able to get access to a copy of Acrobat XI on a university machine and started looking into different options. From the list of batch processes available, I found these to be particularly promising:

  • “Optimize scanned documents” (converting content into searchable text and reducing file size)
  • “Prepare for distribution” (removing hidden information and other oddities)
  • “Archive documents” (create PDF/A compliant documents)

I first tried to run a batch process using OCR. The aim here was to see if I could retrieve some text from files with images containing text. This did not work particularly well. It skipped most files and crashed on several. After the tenth crash, I gave up and moved on.

The “prepare for distribution” option worked better. It ran through the first 300 files or so with no problems, and reduced the files properly. But then the problems started. For many of the files, it just crashed. And when I came to the 2009-files, they turned out to the protected from editing. So I gave up again.

Finally, I tried the archiving function. Here it popped up a dialogue box asking me to fill in title and authors for every single file. I agree that this would be nice to have, but I do not have time to do this manually for 1728 files.

All in all, my Acrobat exploration turned out quite unsuccessful. Therefore, I went back to my ubuntu machine and decided to investigate what type of command-line tools I could use to get the job done.

File integrity

After searching some forums about checking if PDF files are corrupted I came across the useful qpdf application. Running this on the original NIME collection showed that the majority of files had issues.

find . -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)

The check showed that only 794 of the files were labeled as OK, while the others (934) failed. I looked at the failing files, trying to figure out what was wrong. However, I have been unable to find any consistency among failing or passing files. Initially, I thought that there might be differences based on whether they were made in LaTeX or MSWord (or something else), the platform, etc. But it turns out to not be that easy. This may also be because many of the files have been through several steps of updating along the way. For example, for many of the NIME editions, the paper chairs have added page numbers, watermarks, and so on.

Rather than trying to fix the myriad of different problems with the files, I hoped that a file compression step and saving with a newer (and common) PDF format version could help the problem.

File compression

Several of the files were unnecessarily large. Some files were close to 100 MB, and too many were more than 2 MB. This should not be necessary for 4-6 page PDF files. Large files cause bandwidth issues on the server, which means extra cost for the organization and long download time for the user. Although we don’t think about it much, saving space also saves energy and helps reduce our carbon footprint on the planet.

To compress the PDF files, I turned to the convert command line utility, which is part of the Ghostscript family. I experimented with different types of settings, but found that the settings “Screen” and “Ebook” rendered the images pixelated, even on screen. So I went for the “Printer” version, which according to the ghostscript manual should mean a downsampling of images to 300 DPI. This means that they should also print well. The script I used was this:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile="${name}_printer.pdf" "$i"; done

The result was that the folder shrank from 3.8GB to 1.0GB, a quite lovely saving. The image quality also appears to be more or less preserved. However, this is only based on visual inspection of some of the files.

Re-running the file integrity check on all these new files, showed that all 1728 files now passed the check!

PDF/A

I have been working with PDF files for years but had not really read up on the details of the different versions. What turns out to be important when it comes to longterm preservation of files, is that they comply with the PDF/A standard. The regular PDF format has different versions (1.4, 1.5, 1.6) but these are proprietary. However, PDF/A is an ISO standard and appears to be what people use for archiving.

Unfortunately, it turns out that creating PDF/A files using Ghostscript is not entirely straightforward. So more exploration needs to be done there.

Metadata

Finally, one of the problems with the proceedings archive is to get it properly indexed by various search engines. Then having PDF metadata is important. Again, I wish we had the capacity to do this properly for all 1728 files, but that is currently out of the scope.

However, adding some general metadata is better than nothing, so I found the function ExifTool, which can be used to set the metadata on PDF files:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`;  exiftool -Title="Proceedings of the International Conference on New Interfaces for Musical Expression" -Author="International Conference on New Interfaces for Musical Expression" -Subject="A Peer Reviewed article presented at the International Conference on New Interfaces for Musical Expression" "$i"; done

Conclusion

I still need to figure out the PDF/A issue (help wanted!), but the above recipe has helped in improving the quality of the PDF files considerably. It will save us bandwidth, improve accessibility, and, hopefully, also lead to better indexing of the files.