Simple tips for better video conferencing

Image result for video meeting

Very many people are currently moving to video-based meetings. For that reason I have written up some quick advise on how to improve your setup. This is based on my interview advise, but grouped differently.

Network

Image result for network clipart

The first important thing is to have as good a network as you can. Video conferencing requires a lot of bandwidth, so even though your e-mail and regular browsing works fine, it may still not be sufficient for good video transmission.

  • Cabled network: If you are able to connect with an Ethernet cable to your router, that would usually always be the best and most solid solution.
  • Wireless network: If cable won’t work for you (it is also difficult logistically in my own apartment), try to get as close as possible to your wi-fi router.

Audio

Image result for headset clipart

I would argue that improving the audio is more important than the video for video conferencing. Most video conferencing systems (Skype, Zoom, etc.) will prioritize the audio channel, which means that the video may stutter while the audio is passing through fine.

The main trick is to aim for separating the “foreground” as much as possible from the “background”. There are some very basic audio principles to follow:

  • Use a headset: The best way to get decent sound for video conferencing, is to move the microphone as close as possible to your mouth. Headsets with a microphone boom in front of your face are the best, but a regular mobile phone headset (the one that came with your mobile phone, for example) would still be better than nothing.
  • Use headphones: If you for some reason do not have a headset with built-in microphone, using a regular pair of headphones is still better than using the speakers on your computer. With this setup you use the microphone on the computer, which may not be ideal, but at least you won’t get feedback problems.
  • Avoid reverberant rooms: If you aim for clarity in conversation, it is typically better to sit in a smaller and more damped room than a large one. That means that a bedroom is typically better than a larger living room. If you use a headset this is less important, but particularly if you only use the built-in microphone and speakers on a laptop, this could make a huge difference in how your voice gets through.
  • Mute yourself: In most system there is a button to mute yourself. If you are not talking all the time, it helps to mute yourself from the discussion. Just remember to unmute when you want to say something!

Video

Image result for webcam clipart

The same principle of separating “foreground” from “background” applies to the video.

  • Lighting: To obtain the best possible video image, think about your placement with respect to lighting. It is, for example, not ideal to sit in front of a window, since a bright light in the background will make it difficult to see your face.
  • Background: The best is to sit in front of a plain wall. If that is not possible, consider whether the background of your image is what you want to show to your fellow students/colleagues.
  • Video angle: If you are using the built-in camera on your computer you may not have too many options for how to place the camera. But you may still consider shifting the camera position so that you and your surroundings look as good as possible.

Summing up

There are, of course, many ways to improve your video conferencing setup. Many people believe that you need to invest in expensive equipment to get good results. But even cheap consumer products are very capable of producing decent results these days. So it is more a matter of optimizing what you have. Good luck!

“Flattening” Ricoh Theta 360-degree videos using FFmpeg

Ricoh Theta 360-degree camera.

I am continuing my explorations of the great terminal-based video tool FFmpeg. Now I wanted to see if I could “flatten” a 360-degree video recorded with a Ricoh Theta camera. These cameras contain two fisheye lenses, capturing two 180-degree videos next to each other. This results in video files like shown in the screenshot below.

Screenshot from a video recorded with a Ricoh Theta.

These files are not very useful to watch or work with, so we need to somehow “flatten” it into a more meaningful video file. I find it cumbersome to do this in the Ricoh mobile phone apps, so have been looking for a simple solution to do it on my computer.

I see that the FFmpeg developers are working on native support for various 360-degree video files. This is implemented in the filter v360, but since it is not in the stable version of FFmpeg yet, I decided to look for something that works right now. Then I came across this blog post, which shows how to do the flattening based on two so-called PGM files that contain information about how the video should be mapped:

ffmpeg -i ricoh_input.mp4 -i xmap_thetaS_1920x960v3.pgm -i ymap_ThetaS_1920x960v3.pgm -q 0 -lavfi "format=pix_fmts=rgb24,remap" remapped.mp4

The end result is a flattened video file, as shown below:

Screenshot from a “flattened” 360 degree video.

As for where to split up the video (it is a continuous 360-degree video after all) I will have to investigate later.

VideoAnalysis v2.0

I am happy to announce a new version of VideoAnalysis, a standalone application for OSX and Windows for creating visualizations and extract motion features from video files.

The GUI of VideoAnalysis v2.0

VideoAnalysis was developed as a standalone version of the Musical Gestures Toolbox. I began working on the toolbox back in 2004, as a collection of modules for Max/MSP/Jitter. Then some people asked me to make a standalone version with some of the core functionality. This version was primarily developed for music researchers, but is also used for sports, dance, healthcare, architecture, and interaction design.

I have less time for development myself these days, so most of the work on the new release has been made by Bálint Laczkó and Aleksander Tidemann. Thanks!

So for anyone reading this: please try out the new version. And if you have problems and/or find any bugs, please report them in the tracker.

Creating different types of keyframe displays with FFmpeg

In some recent posts I have explored the creation of motiongrams and average images, multi-exposure displays, and image masks. In this blog post I will explore different ways of generating keyframe displays using the very handy command line tool FFmpeg.

As in the previous posts, I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first attempt is to create a 3×3 grid image by just sampling frames from the original image. I spent some time exploring different ways of doing this. It is possible to do it with a one-liner:

ffmpeg -ss 00:00:05 -i dance.mp4 -frames 1 -vf "select=not(mod(n\,200)),scale=495:256,tile=3x3" tile.jpg

The problem with this approach, and many similar that I found by googling around, is that it samples frames with a specific interval. In the above code it looks up every 200th frame, which gives this image:

The problem is that the image only contains information about the 1600 first frames, or more specifically frames 0, 200, 400, 600, 800, 1000, 1200, 1400, 1600. I want to include frames that represent the whole video.

I see that many people create such displays by sampling based on scene changes in the video. There are two problems with this. First, it requires that there are scene changes in the video. This is usually not the case in the videos that I study, which are primarily recorded with a stationary camera in which only the “foreground” changes. The second problem with sampling one “salient” frames, is that we loose information about the temporal unfolding of the video file. From an analysis point of view, it is actually quite useful to know more or less when things happened in the video. That is not so easy if the sampling is uneven.

I was therefore happy to find a nice script made by Martin Sikora, which is based on looking up the duration of the file and use this to calculate the frames to export from the file. Running this script on the original video gives this image:

The 9 frames in the display above reveal that there is little dance in the first one third of the video file (can see the arm of the dancer enter in the third image). It also shows how the dancer moved around in the space. It is possible to get some idea about her spatial distribution, but there is little information about her actual motion throughout the sequence. I was therefore curious to try out making such a grid-based display from a history video, which actually shows some more of the actual motion.

It is possible to make (motion) history videos in both the Matlab and Python versions of the Musical Gestures Toolbox, but today I was curious as to whether it could be done simply with FFmpeg. And it turns out to be quite simple using a filter called tmix:

ffmpeg -i dance.mp4 -filter:v tmix=frames=30:weights="10 1 1" dance_tmix30.mp4

I played around for a while with the settings before ending up with these ones. Here I average over 30 frames (which is half a second for this 60fps video). I also use weight feature to give preference to the current frame. This makes it easier to follow the dancer, as the trajectories of past motion become more blurred.

Running the above grid-script on this video results in a keyframe display that shows more of the motion happening in the frames in question. This is useful to see, for example, when she moved more than in other frames.

I am quite happy with the above-mentioned, but it is not particularly fast. Creating the history video is time-consuming, since it has to process all the frames in the entire video. I therefore tested speeding up the video 8 times, using this command (the -an flag is used to remove the audio):

ffmpeg -i dance.mp4 -filter:v "setpts=0.125*PTS" -an output8x.mp4

Running the history video function on this then runs quite a bit faster, and results in this hi-speed history video:

Running this through the grid-script gives a keyframe display that is both similar and different to the one above:

It is quite a lot quicker to generate, and also gives more information about the motion sequence.

The conclusion is that it is, indeed, possible to make a lot of interesting video visualizations using “only” FFmpeg. Several of these scripts are also much faster than the scripts I have previously used in Matlab and Python. So I will definitely continue to explore FFmpeg, and look at how it can be integrated with the other toolboxes.

Creating image masks from video file

As part of my exploration in creating multi-exposure keyframe image displays with FFmpeg and ImageMagick, I tried out a number of things that did not help solve the initial problem but still could be interesting for other things. Most interesting was the automagic creation of image masks from a video file.

I will use a contemporary dance video from the AIST Dance Video Database as an example:

The first step is to extract keyframes from the video file using this one-liner ffmpeg command:

ffmpeg -skip_frame nokey -i *.mp4 -vsync 0 -r 30 -f image2 t%02d.tiff

This will use the keyframes from the MP4 file, which should be faster than doing a new analysis of the file. It could, of course, also be possible to sample the video at regular intervals, but the keyframes seem to work fine for my usage. I also choose to save the exported keyframes as TIFF files to avoid running multiple rounds of compression on the files. The end result is a bunch of keyframe images that can be used for further processing.

Here we are lucky, because the first frame actually contains the background of the scene. So we can use that frame to create a “foreground” image by subtracting the background image like this:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert t01.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

The end result is a series with the foreground masks:

And then the final result is a series of images in which only the foreground is shown. The “glow” around the images is because of the blur effect used when creating the mask:

Adaptive background

There may also be cases in which there is no readily available background image as we used above, such as in this hip-hop AIST dance video:

Then it is possible to create a background image by averaging over all the images, and hope that this could “remove” the foreground. Here is a one-liner that does this (assuming that you have exported the individual keyframes as mentioned in the beginning of this post):

convert *.tiff -background black -compose lighten -flatten background.tiff

This works quite well, although we can see that the camera right behind the dancer is a little more faint the two others:

Background image created by averaging over all the keyframes.

This background image can then be used to subtract from the other images like we did above:

for i in *.tiff; 
do 
name=`echo $i | cut -d'.' -f1`; 
convert background.tiff $i -compose difference -composite -threshold 5% -blur 0x3 -threshold 20% -blur 0x3 "$name-mask.tiff" 
convert $i "$name-mask.tiff" -compose multiply -flatten "$name-clean.jpg"
done

It works very well, except for that the camera behind the performer (that wasn’t masked properly) also shows up in the masked foreground images:

This method works quite well and has the benefit of being very fast. It is possible to get a better result by creating an average image from the entire video (and not only the keyframes), but this would also take very much longer.