Some hours of recordings
The starting point was a bunch of recordings from our recent MusicLab Copenhagen featuring the amazing Danish String Quartet. A team of RITMO researchers went to Copenhagen and captured the quartet in both rehearsal and performance. We have data and media from motion capture, eye tracking, physiological sensing, audio, video, and more. The plan is to make it all available on OSF.
When it comes to video, we have many different recordings, ranging from small GoPro cameras hanging around the space to professional streaming cameras operated by a camera crew. In addition, we have one recording from a Garmin VIRB 360 camera hanging in the chandelier close to the musicians. Those recordings are what I will explore in this post.
An upside 360 recording
The Garmin VIRB camera records a 360-degree video using two 180-degree lenses. Unlike Ricoh Theta’s stereo-spherical videos, the Garmin stores the recording with an equirectangular projection. Here is a screenshot from the original recording:
There are some obvious problems with this recording. First, the recording is upside down since the camera was hanging upside down from a chandelier above the musicians. The panning and tilting of the camera are also slightly off concerning the placement of the musicians. So it is necessary to do some pre-processing before analysing the files.
Most 360-degree cameras come with software for adjusting the image. The Garmin app can do it, but I already have all the files on a computer. It could also be done in video editing software, although I haven’t explored that. In any case, I look for an option that allows me to batch process a bunch of videos (yes, we have hours of recordings, and they are split up into different files).
Since working on the Ricoh files last year, I have learned that FFmpeg’s new 360 filter is part of the regular release. So I wanted to give it a spin. Along the way, I learned more about different image projections types that I will outline in the following.
The starting point was the equirectangular projection coming out of the Garmin VIRB. The first thing to make it more useful is to flip the video around and place the musicians in the centre of the image.
The different functions of the
v360 filter in FFmpeg are documented but not explained very well. So it took me quite some time to figure out how to make the adjustments. This is the one-liner I ended up with to create the image above:
ffmpeg -i input.mp4 -vf "v360=input=e:output=e:yaw=100:pitch=-50:v_flip=1:h_flip=1" output.mp4
There are some tricks I had to figure out to make this work. First, I use the
v360 filter with equirectangular (shortened to
e) as both the input and output of the filter. The rotation was done using both the
h_flip commands, which rotate around both the horizontal and vertical axes. In the original image, the cellist was on the edge. So I also had to turn the whole image horizontally using
yaw and move the entire image down a bit using
pitch. It took me some manual testing to figure out the correct numbers here.
Since the analysis will be focused on the musicians, I have also cropped the image using the general
crop filter (note that you can add multiple filters with a comma in FFmpeg if you try to add another filter, only the last one will be used):
ffmpeg -i input.mp4 -vf "v360=input=e:output=e:yaw=100:pitch=-50:v_flip=1:h_flip=1, crop=1700:1000:1000:550" output_crop.mp4
This gives us a nicely cropped video of the musicians:
This video already looks quite good and could be used for analysis (for example, in one of the versions of Musical Gestures Toolbox). But I wanted to explore if other projections may work better.
An alternative projection is called gnomonic in fancy terminology and “flat” in more plain language. It looks like this:
ffmpeg -i input.mp4 -vf "v360=input=e:output=flat:v_flip=1:h_flip=1:yaw=90:pitch=-30:h_fov=150:v_fov=150" output_flat.mp4
Here I used the
flat output type in FFmpeg and did the same flipping, panning and tilting as above. I had to use slightly different numbers for
pitch to make it work, though. Also, here I added some cropping to focus on the musicians:
ffmpeg -i input.mp4 -vf "v360=input=e:output=flat:v_flip=1:h_flip=1:yaw=90:pitch=-30:h_fov=150:v_fov=150, crop=3800:1100:0:800" output_flat_crop.mp4
This left me with the final video:
There are many problems with this projection, and the most obvious is the vast size difference between the musicians. So I won’t use this version for anything, but it was still interesting to explore.
A different projection is the cube map. Here is an illustration of how it relates to the equirectangular projection:
v360 filter also allows for creating such projections. It has multiple versions of this idea. I found a nice blog post by Anders Jirås that helped me understand how this function works.
First, I tested the
c6x1 output function:
ffmpeg -i input.mp4 -vf "v360=input=e:output=c6x1:out_forder=frblud:yaw=50:pitch=-30:roll=50:v_flip=1:h_flip=1" output_c6x1.mp4
I changed the order of images using
out_forder (as documented here) and (again) played around with the
roll to make something that worked well. This resulted in an image like this:
There is also a function called
c3x2, which will generate an image like this:
Adding some cropping to the 3×2 projection:
ffmpeg -i input.mp4 -vf "v360=input=e:output=c3x2:out_forder=frblud:yaw=50:pitch=-30:roll=50:v_flip=1:h_flip=1, crop=1500:1080:150:0" output_c3x2_crop.mp4
Then we end up with an image like this:
This looks quite weird, mainly because the cellist wraps into a different cube than the others.
Finally, I wanted to test a new projection invented by Google a couple of years ago: the Equi-Angular Cubemap. The idea has been to create a projection with fewer artefacts on the edges:
In FFmpeg, this can be achieved with the
ffmpeg -i input.mp4 -vf "v360=input=e:output=eac:yaw=100:pitch=-50:roll=0:v_flip=1:h_flip=1" output_eac.mp4
The resultant image looks like this:
Only the top part of the image is useful for my analysis, which can be cropped out like this:
ffmpeg -i input.mp4 -vf "v360=input=e:output=eac:yaw=100:pitch=-50:roll=0:v_flip=1:h_flip=1, crop=2200:1200:750:0" output_eac_crop.mp4
The final image looks like this:
The equi-angular cubemap should have better projection overall because it avoids too much distortion on the edges. However, that comes at the cost of some more artefacts in the central parts of the image. So when cropping into the image as I did above, the equirectangular may actually work best.
After quite some time fiddling around with FFmpeg and trying to understand the various parts of the new
v360 function, I can conclude that the original equidistant projection is probably the best one to use for my analysis. The other projections probably work better for various types of 3D projections. Still, it was useful to learn how to run these processes using FFmpeg. This will surely come in handy when I am going to process a bunch of these files in the near future.