The Million Song Dataset

I guess I was too much into NIME-organization back in March, to notice the launch of the The Million Song Dataset. It contains no audio, but 300 GB worth of metadata about 1 million popular music songs. This sounds like hours of great fun for music researchers around the world, and will probably also be a great resource for music students working on MIR-applications. I would also expect that it is possible to use this for a number of creative applications.

Here is a quote from the press release:

For far too long, researchers and engineers working on Music Information Retrieval (MIR) have been forced to pay a hefty ante before being able to conduct their research: namely, they’ve had to build a set of data on which test their theories and hone their algorithms.

It may have started as a flippant suggestion for how to solve that problem, but The Million Song Dataset is now real, and anyone can download it. A collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio), The Million Song Dataset has four main objectives:

  • To encourage research on algorithms that scale to commercial sizes
  • To provide a reference dataset for evaluating research
  • As a shortcut alternative to creating a large dataset with The Echo Nest’s API
  • To help new researchers get started in the MIR field.

What could you say but thanks!

Published by


Alexander Refsum Jensenius is a music researcher and research musician living in Oslo, Norway.