Running a successful Zoom Webinar

I have been involved in running some Zoom Webinars over the last year, culminating with the Rhythm Production and Perception Workshop 2021 this week. I have written a general blog post about the production. Here I will write a little more about some lessons learned on running large Zoom Webinars.

In previous Webinars, such as the RITMO Seminars by Rebecca Fiebrink and Sean Gallagher, I ran everything from my office. These were completely online events, based on each person sitting with their own laptop. This is where the Zoom Webinar solution shines.

Things become more complex once you try to run a hybrid event, where some people are online and others are on-site. Then you need to combine methods for video production and streaming with those of a streamlined video conferencing solution. It is doable but quite tricky.

RPPW 2021 chair Anne Danielsen introduces one of the panels from RITMO.

The production of the RPPW Webinar involved four people: myself as “director”, RITMO admin Marit Furunes as “Zoom captain”, and two MCT students, Thomas Anda and Wenbo Yi as “Zoom station managers”. We could probably have done it with three people, but given that other things were going on simultaneously, I am happy we decided to have four people involved. In the following, I will describe each person’s role.

Zoom station 1

The first Zoom station, RITMO 1, was operated by Wenbo Yi. He was sitting behind the lecture desk in the hall, equipped with one desktop PC with two screens. The right screen was split to be shown with a projector on the wall. This was the screen that showed displays when there were breaks, and so on.

MCT student Wenbo Yi in control of RITMO 1 during RPPW.

There are three cameras connected to the PC: one front-facing that we used for the moderator of the keynote presentations, one next to the screen on the wall that showed a close-up of conference chair Anne Danielsen during the introductions, and one in the back of the hall that showed the whole space.

There are four microphones connected to the PC and the PA system in the hall. One on the desk that was used for the keynote moderation. We only used one of the wireless microphones, a handheld one that Anne used during her introductions.

The nice thing about Zoom is that it is easy for a person to turn on and off cameras and microphones. However, this is designed around the concept that you are sitting in front of your computer. When you are standing in the middle of the room, someone else will need to click. That was the job of Wenbo. He switched between cameras and microphones and turned on and off slides.

Zoom station 2

The second Zoom station, RITMO 2, was operated by Thomas Anda, sitting in the control room behind the hall. He was controlling a station that was originally designed as a regular video streaming setup. This includes two remote-controlled PTZ cameras connected to a video mixer. For regular streams, we would have tapped the audio and video from the auditorium and made a mix to be streamed to, for example, YouTube. Now, we mainly used one of the PTZ cameras to display the general picture from the hall.

MCT student Thomas Anda sitting in the control room with the RITMO 2 station during RPPW.

The main job of Thomas was to playback all the 100 pre-recorded videos. We had a separate, ethernet-cabled PC for this task connected to the Zoom Webinar and shared the screen. The videos were cued in VLC, with poster images inserted between the videos. During our testing of this setup, we discovered the uneven sound level of the video files, which led to a normalization procedure for all of them.

In theory, we could have played the video files from RITMO station 1. However, both Wenbo and Thomas had plenty of things to think about, so it would have been hard to do it all alone. Also, having two stations allowed for having two camera views and also added redundancy for the stream.

Zoom captain station

The third station was controlled by our “Zoom captain”, Marit Furunes. She served as the main host of the Webinar most of the time and was responsible for promoting and de-promoting people to panellists during the conference.

Marit Furunes was the “Zoom captain”. Here, with conference chair Anne Danielsen in the background.

It is possible to set up panels in advance, but that requires separate Zoom Webinars and individualized invitation e-mails. We have experienced in the past that people often forget about these e-mails, so we decided to just have one Zoom Webinar for the entire conference and rather move people in and out of the panels manually. That required some manual work by Mari, but it also meant that she was in full control of who could talk in each session.

She was also in charge of turning on and off people’s video and sound, and ensuring that the final streamed looked fine.

Director station

I was sitting next to Marit, controlling the “director station”. I was mainly checking that things were running as they should, but I also served as a backup for Marit when she took breaks. In between, I also tweeted some highlights, replied to e-mails that came in, and commented on things in Slack.

From the “director station” I controlled one laptop as host and had another laptop for watching the output stream.

Monitoring and control

Together, the four of us involved in the production managed to create a nice-looking result. There were some glitches, but in general, things went as planned. The most challenging part of working with a Webinar-based setup is the lack of control and monitoring. What we have learned is that “what you see is not what you get”. We never really understood what to click on to get the final result we wanted. For example, we often had to click back and forth between the “gallery” and “speaker” view to get the desired result.

Also, as a host, you can turn off other cameras, but you cannot turn them on, only ask for the person to turn them on. That makes sense in many ways. After all, you should not be allowed to turn on the camera of another person remotely. However, as a production tool in an auditorium, this was cumbersome. It happened that Marit and I wanted to turn on the main video camera in the hall (from RITMO 2) or the front-facing camera (from RITMO 1). But we were not allowed to do this. Instead, we had to request that Thomas or Wenbo could turn on the cameras.

Summing up

The Zoom Webinar function was clearly made for a traditional video-conferencing-like setup. For that it also works very well. As described above, we managed to make it work quite well also in a hybrid setup. However, this required a 4-person strong team and 5 computers. The challenge was that we never really felt that we were completely in control of things. Also, we could not properly monitor the different signals.

The alternative would be a regular video streaming solution, based on video and audio mixers. That would have given us a much higher control of the final stream, including good monitoring. It would have required more equipment (that we have) but not necessarily more people. We would have lost out on some of the Zoom functionality, though, like the Q&A functionality that works very well.

Next time I am doing something like this, I would try to run a stream-based setup instead. Panellists could then come in through a Zoom Room, which could be mixed into a stream using either our hardware video mixer or a software mixer like OBS. Time will tell if that ends up being better or worse.

Normalize audio in video files

We are organizing the Rhythm Production and Perception Workshop at RITMO next week. As mentioned in another blog post, we have asked presenters to send us pre-recorded videos. They are all available on the workshop page.

During the workshop, we will play sets of videos in sequence. When doing a test run today, we discovered that the sound levels differed wildly between files. There is clearly the need for normalizing the sound levels to create a good listener experience.

Batch normalization

How does one normalize around 100 video files without too much pain and effort? As always, I turn to my go-to video companion, FFmpeg. Here is a small script I made to do the job:

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
   name=`echo $i | cut -d'.' -f1`; 
   ffmpeg -i "$i" -c:v copy -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

This was the result of some searching around for a smart solution (in Qwant, btw, my new preferred search engine). For example, I use the “nullglob” trick to list multiple file types in the for loop.

The most important part of the script is the normalization, which I found in this blog post. The settings are described as:

  • loudnorm: the name of the normalization filter
  • I: the integrated loudness (from -70 to -5.0)
  • LRA: the loudness range (from 1.0 to 20.0)
  • TP: Indicates the max true peak (from -9.0 to 0.0)

The settings in the script normalize to a high but not maximum signal, which leaves some headroom.

To compress or not

To save processing time and avoid recompressing the video, I have included “-c:v copy” in the script above. Then FFmpeg copies over the video content directly. This is fine for videos with “normal” H.264 compression, which is the case for most .MP4 files. However, when getting 100 files made on all sorts of platforms, there are surely some oddities. There were a couple of cases with weird compression formats, that for some reason failed with the above script. One also had interlacing issues. For them, I modified the script to recompress the files.

#!/bin/bash

shopt -s nullglob
for i in *.mp4 *.MP4 *.mov *.MOV *.flv *.webm *.m4v; do 
    name=`echo $i | cut -d'.' -f1`; 
    ffmpeg -i "$i" -vf yadif -af loudnorm=I=-16:LRA=11:TP=-1.5 "${name}_norm.mp4"; 
done

In this script, the copy part is removed. I have also added “-vf yadif”, which is a de-interlacing video filter.

Summing up

With the first script, I managed to normalize all 100 files in only a few minutes. Some of the files turned up with 0 bytes due to issues with copying the video data. So I ran through these with the second script. That took longer, of course, due to the need for compressing the video.

All in all, the processing took around half an hour. I cannot even imagine how long it would have taken to do this manually in a video editor. I haven’t really thought about the need for normalizing the audio in videos like this before. Next time I will do it right away!

Combining audio and video files with FFmpeg

When working with various types of video analysis, I often end up with video files without audio. So I need to add the audio track by copying either from the source video file or from a separate audio file. There are many ways of doing this. Many people would probably reach for a video editor, but the problem is that you would most likely end up recompressing both the audio and video file. A better solution is to use FFmpeg, the swizz-army knife of video processing.

As long as you know that the audio and video files you want to combine are the same duration, this is an easy task. Say that you have two video files:

  • input1.mp4 = original video with audio
  • input2.avi = analysis video without audio

Then you can use this one-liner to copy the audio from one file to the other:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest output.avi

The output.avi file will have the same video content as input2.avi, but with audio from input1.mp4. Note that this is a lossless (and fast) procedure, it will just copy the content from the source files.

If you want to convert (and compress) the file in one operation, you can use this one-liner to export an MP4 file with .h264 video and aac audio compression:

ffmpeg -i input1.mp4 -i input2.avi -c copy -map 1:v:0 -map 0:a:0 -shortest -c:v mpeg4 -c:a aac output.mp4

Since this involves compressing the file, it will take (much longer) than the first method.

Some thoughts on microphones for streaming and recording

Many people have asked me about what types of microphones to use for streaming and recording. This is really a jungle, with lots of devices and things to think about. I have written some blog posts about such things previously, such as tips for doing Skype job interviews, testing simple camera/mic solutions, running a Hybrid Disputation, and how to work with plug-in-power microphones.

Earlier today I held a short presentation about microphones at RITMO. This was during our informal Food & Paper lunch seminar, where people eat their lunch while listening to presentations about different topics (usually something academic, but sometimes also other things). Here is a cut-down version of the presentation:

The presentation starts by drawing up the main things to think about: microphones and speakers and the environments that people use these devices within. When we stream or record, we don’t really control other people’s speakers and environments. So the two things we should think about are (1) the microphone we use and (2) the environment we are in.

A very brief summary of microphones, speakers, and room acoustics.

The make a long story short, here are my general advice:

  • Place yourself in a “dry” and quiet space, if possible. A small room with carpets and curtains is much better than a big and empty space.
  • A headset with a boom microphone will usually give the best sound overall, without feedback, and allow you to move your head around. I have many USB headsets from Logitech, Jabra, and Poly, and all of them are fine. The more expensive ones are more comfortable to wear, but the sound quality doesn’t really differ that much. I generally try to avoid Bluetooth headsets since they need to be charged and paired to function. If you can live with a cable, you will get better sound for a lower price.
  • A “podcast-style” condenser microphone will give a more pleasant and radio-like sound. You can also avoid sitting with headphones on all the time, which is very tiresome after some hours. However, condenser microphones are usually relatively large, need a stand, and you may get into feedback problems. There are many options here, but I have been very positively surprised by this cheap Marantz USB microphone.
  • A lavalier microphone is the best choice for making video recordings. They are small, pick up sounds nicely, and some (like the Røde Smartlav+) can be connected directly to a mobile phone or laptop.

There are always better, more expensive, and more complicated solutions out there. However, I am very impressed by some of the newest products that have arrived on the market. The products highlighted above are reasonably priced and will greatly improve the audio of both streaming and recording.

Convert between video containers with FFmpeg

In my ever-growing collection of smart FFmpeg tricks, here is a way of converting from one container format to another. Here I will convert from a QuickTime (.mov) file to a standard MPEG-4 (.mp4), but the recipe should work between other formats too.

If you came here to just see the solution, here you go:

ffmpeg -i infile.mov -acodec copy -vcodec copy outfile.mp4

In the following I will explain everything in a little more detail.

Container formats

One of the confusing things about video files is that they have both a container and a compression format. The container is often what denotes the file suffix. Apple introduced the .mov format for QuickTime files and Microsoft used to use .avi files.

Nowadays, there seems to a converge towards using MPEG containers and .mp4 files. However, both Apple and Microsoft software (and others) still output other formats. This is confusing and can also lead to various playback issues. For example, many web browsers are not able to play these formats natively.

Compression formats

The compression format denotes how the video data is organized on the inside of a container. Also, here there are many different formats. The most common today is to use the H.264 format for video and AAC for audio. These are both parts of the MPEG-4 standard and can be embedded in .mp4 containers. However, both H.264 and AAC can also be embedded in other containers, such as .mov and .avi files.

The important thing to notice is that both .mov and .avi files may contain H.264 video and AAC audio. In those cases, the inside of such files is identical to the content of a .mp4 file. But since the container is different, it may still be unplayable in certain software. That is why I would like to convert from one container format to another. In practice that means converting from .mov or .avi to .mp4 files.

Lossless conversion

There are many ways of converting video files. In most cases, you would end up with a lossy conversion. That means that the video content will be altered. The file size may be smaller, but the quality may also be worse. The general rule is that you want to compress a file as few times as possible.

For all sorts of video conversion/compression jobs, I have ended up turning to FFmpeg. If you haven’t tried it already, FFmpeg is a collection of tools for doing all sorts of audio/video manipulations in the terminal. Working in the terminal may be intimidating at first, but you will never look back once you get the hang of it.

Converting a file from .mov to .mp4 is as simple as typing this little command in a terminal:

ffmpeg -i infile.mov outfile.mp4

This will change from a .mov container to a .mp4 container, which is what we want. But it will also (probably) re-compress the video. That is why it is always smart to look at the content of your original file before converting it. You can do this by typing:

ffmpeg -i infile.mov

For my example file, this returns the following metadata:

  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2016-08-10T10:47:30.000000Z
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: MacBookPro11,1
    com.apple.quicktime.software: Mac OS X 10.11.6 (15G31)
    com.apple.quicktime.creationdate: 2016-08-10T12:45:43+0200
  Duration: 00:00:12.76, start: 0.000000, bitrate: 5780 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Video
      encoder         : H.264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)
    Metadata:
      creation_time   : 2016-08-10T10:47:30.000000Z
      handler_name    : Core Media Audio

There is quite a lot of information there, so we need to look for the important stuff. The first line we want to look for is the one with information about the video content:

Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1844x1160 [SAR 1:1 DAR 461:290], 5243 kb/s, 58.66 fps, 60 tbr, 6k     

Here we can see that this .mov file contains a video that is already compressed with H.264. Another thing we can see here is that it is using a weird pixel format (1844×1160). The bit rate of the file is 5243 kb/s, which tells something about how large the file will be in the end. And it is also interesting to see that it is using a framerate of 58.66 fps, which is also a bit odd.

Similarly, we can look at the content of the audio stream of the file:

Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 269 kb/s (default)

Here we can see that the audio is already compressed with AAC at a standard sampling rate of 44.1 kHz and at a more nonstandard bit rate of 269 kb/s.

The main point of investigating the file before we do the conversion is to avoid re-compressing the content of the file. After all, the content is already in the right formats (H.264 and AAC) even though it is in an unwanted container (.mov).

Today’s little trick is how to convert from one format to another without modifying the content of the file, only the container. That can be achieved with the code shown on top:

ffmpeg -i original.mov -acodec copy -vcodec copy outfile.mp4

There are several benefits of doing it this way:

  1. Quality. Avoiding an unnecessary re-compression of the content, which would only degrade the content.
  2. Preserve the pixel size, sampling rates, etc. of the originals. Most video software will use standard settings for these. I often work with various types of non-standard video files, so it is nice to preserve this information.
  3. Save time. Since no re-compression is needed, we only copy content from one container to another. This is much, much faster than re-compressing the content.

All in all, this long explanation of a short command may help to improve your workflows and save some time.