New article: Best versus Good Enough Practices for Open Music Research

After a fairly long publication process, I am happy to finally announce a new paper: Best versus Good Enough Practices for Open Music Research in Empirical Musicology Review.

Summary

The abstract reads:

Music researchers work with increasingly large and complex data sets. There are few established data handling practices in the field and several conceptual, technological, and practical challenges. Furthermore, many music researchers are not equipped for (or interested in) the craft of data storage, curation, and archiving. This paper discusses some of the particular challenges that empirical music researchers face when working towards Open Research practices: handling (1) (multi)media files, (2) privacy, and (3) copyright issues. These are exemplified through MusicLab, an event series focused on fostering openness in music research. It is argued that the “best practice” suggested by the FAIR principles is too demanding in many cases, but “good enough practice” may be within reach for many. A four-layer data handling “recipe” is suggested as concrete advice for achieving “good enough practice” in empirical music research.

The article is written based on challenges we have faced with adhering to Open Research principles within music research. I mention our experiences with MusicLab in particular.

The perhaps most important take-home message from the article is my set of recommendations in the end.

1. DATA COLLECTION (“RAW”)

1a. Create analysis-friendly data. Planning what to record will save time afterward, and will probably lead to better results in the long run. Write a data management plan (DMP).

1b. Plan for mistakes. Things will always happen. Ensure redundancy in critical parts of the data collection chain.

1c. Save the raw data. In most cases, the raw data will be processed in different ways, and it may be necessary to go back to the start.

1d. Agree on a naming convention before recording. Cleaning up the names of files and folders after recording can be tedious. Get it right from the start instead. Use unique identifiers for all equipment (camera1, etc.), procedures (pre-questionnaire1, etc.) and participants (a001, etc.).

1e. Make backups of everything as quickly as possible. Losing data is never fun, and particularly not the raw data.

2. DATA PRE-PROCESSING (“PROCESSED”)

2a. Separate raw from processed data. Nothing is as problematic as over-writing the original data in the pre-processing phase. Make the raw data folder read-only once it is organized.

2b. Use open and interoperable file formats. Often the raw data will be based on closed or proprietary formats. The data should be converted to interoperable file formats as early as possible.

2c. Give everything meaningful names. Nothing is as cryptic as 8-character abbreviations that nobody will understand. Document your naming convention.

3. DATA STORAGE (“COOKED”)

3a. Organize files into folders. Creating a nested and hierarchical folder structure with meaningful names is a basic, but system-independent and future-proof solution. Even though search engines and machine learning improve, it helps to have a structured organizational approach in the first place.

3b. Make incremental changes. It may be tempting to save the last processed version of your data, but it may be impossible to go back to make corrections or verify the process.

3c. Record all the steps used to process data. This can be in a text file describing the steps taken. If working with GUI-based software, be careful to note down details about the software version, and possibly include screenshots of settings. If working with scripts, document the scripts carefully, so that others can understand them several years from now. If using a code repository (recommended), store current snapshots of the scripts with the data. This makes it possible to validate the analysis.

4. DATA ARCHIVE (“PRESERVED”)

4a. Always submit data with manuscripts. Publications based on data should be considered incomplete if the data is not accessible in such a way that it is possible to evaluate the analysis and claims in the paper.

4b. Submit the data to a repository. To ensure the long-term preservation of your data, also independently of publications, it should be uploaded to a reputable DOI-issuing repository so that others can access and cite it.

4c. Let people know about the data. Data collection is time-consuming, and in general, most data is under-analyzed. More data should be analyzed more than once.

4d. Put a license on the data. This should ideally be an open and permissive license (such as those suggested by Creative Commons). However, even if using a closed license, it is important to clearly label the data in a way so that others can understand how to use them.

MusicLab Copenhagen

After nearly three years of planning, we can finally welcome people to MusicLab Copenhagen. This is a unique “science concert” involving the Danish String Quartet, one of the world’s leading classical ensembles. Tonight, they will perform pieces by Bach, Beethoven, Schnittke and folk music in a normal concert setting at Musikhuset in Copenhagen. However, the concert is nothing but normal.

Live music research

During the concert, about twenty researchers from RITMO and partner institutions will conduct investigations and experiments informed by phenomenology, music psychology, complex systems analysis, and music technology. The aim is to answer some big research questions, like:

  • What is musical complexity?
  • What is the relation between musical absorption and empathy?
  • Is there such a thing as a shared zone of absorption, and is it measurable?
  • How can musical texture be rendered visually?

The concert will be live-streamed (on YouTube and Facebook) and it will also be aired on Danish radio. There will also be a short film documenting the whole process.

Researchers and staff from RITMO (and friends) in front of the concert venue.

Real-world Open Research

This concert will be the biggest and most complex MusicLab event to date. Still, all the normal “ingredients” of a MusicLab will be in place. The core is a spectacular performance. We will capture a lot of data using state-of-the-art technologies, but in a way that is as little obtrusive as possible for performers and the audience. After the concert, both performers and researchers will talk about the experience.

Of course, being a flagship Open Research project, all the collected data will be shared openly. The researchers will show glimpses of data processing procedures as part of the “data jockeying” at the end of the event. However, it is first when all data is properly uploaded and pre-processed that data processing can start. All the involved researchers will dig into their respective data. But since everything is openly available, anyone can go in and work on the data as they wish.

Proper preparation

Due to the corona situation, the event has been postponed several times. That has been unfortunate and stressful for everyone involved. On the positive side, it has also meant that we have been able to rehearse and prepare very well. Already a year ago we ran a full rehearsal of the technical setup of the concert. We even live-streamed the whole preparation event, in the spirit of “slow TV”:

I am quite confident that things will run smooth during the concert. Of course, there are always obstacles. For example, one of our eye-trackers broke in one of the last tests. And it is always exciting to wait for Apple and Google to approve updates of our MusicLab app in their respective app stores.

Want to see how it went. Have a look here.

From Open Research to Science 2.0

Earlier today, I presented at the national open research conference Hvordan endres forskningshverdagen når åpen forskning blir den nye normalen? The conference is organized by the Norwegian Forum for Open Research and is coordinated by Universities Norway. It has been great to follow the various discussions at the conference. One observation is that very few questions the transition to Open Research. We have, finally, come to a point where openness is the new normal. Instead, the discussions have focused on how we can move forwards. Having many active researchers in the panels also led to focus on solutions instead of policy.

Openness leads to better research

In my presentation, I began by explaining why I believe opening the research process leads to better research:

  • Opening the process makes the researcher more carefully document everything. For example, nobody wants to make messy data or code available. Adding metadata and descriptions also help improve the quality of what is made available. It also helps in removing irrelevant content.
  • Making the different parts openly available is important for ensuring transparency in the research process. This allows reviewers (and others) to check claims in published papers. It also allows for others to replicate results or use data and methods in other research.
  • This openness and accessibility will ultimately lead to better quality control. Some people complain that we make available lots of irrelevant information. True, not everything that is made available will be checked or used. The same is the case for most other things on the web. That does not mean that nobody will never be interested. We also need to remember that research is a slow activity. It may take years for research results to be used.

Of course, we face many challenges when trying to work openly. As I have described previously, we particularly struggle with privacy and copyright issues. We also don’t have the technical solutions we need. That led me to my main point in the talk.

Connecting the blocks

The main argument in my presentation was that we need to think about connecting the various blocks in the Open Research puzzle. There has, over the last few years, been a lot of focus on individual blocks. First, making publications openly available (Open Access). Nowadays, there is a lot of discussion about Open Data and how to make data FAIR (Findable, Accessible, Interoperable, Reusable). There is also some development in the other building blocks. What is lacking today is a focus on how the different blocks are connected.

There is now a need to connect the different blocks. Dark blue blocks are part of the research process, while the light blue blocks focus on applications and assessment.

By developing individual blocks without thinking sufficiently about their interconnectedness, I fear that we lose out on some of the main points of opening everything. Moving towards Open Research is not only about making things open; it is about rethinking the way we research. That is the idea of the concept of Science 2.0 (or Research 2.0, as I would prefer to call it).

There is much to do before we can properly connect the blocks. But some elements are essential:

  • Persistent identifiers (PID): Having unique and permanent digital references that makes it possible to find and reuse digital material is essential for finding this. This could be DOIs for data, ORCID for researchers, and so on.
  • Timestamping: Many researchers are concerned about who did something first. For example, many people wait with releasing their data because they want to publish an article first. That is because the data (currently) does not have any “value” in itself. In my thinking, if data had PIDs and timestamping they would also be citable. This should also be combined with proper recognition of such contributions.
  • Version control: It has been common to archive various research results when the research is done. This is based on pre-digital workflows. Today, it is much better to provide solutions for proper version control of everything we are doing.

Fortunately, things move in the right direction. It is great to see more researchers try to work openly. That also exposes the current “holes” in infrastructures and policies.

More research should be solid instead of novel

Novelty is often highlighted as the most important criterion for getting research funding. That a manuscript is novel is also a major concern for many conference/journal reviewers. While novelty may be good in some contexts, I find it more important that research is solid.

I started thinking about novelty versus solidity when I read through the (excellent) blog posts about the ISMIR 2021 Reviewing Experience. These blog posts deal with many topics, but the question about novelty caught my attention. Even though the numbers are small, it turned out that the majority of the survey respondents listed novelty as the most important selection criterion for the conference. This is not unique to ISMIR; I think many journals and conferences ask about novelty.

Defining novelty

Given that novelty is a criterion “everyone” considers all the time, few people discuss what it actually means. What does it actually mean that something is novel? Merriam-Webster suggests that it is “something new or unusual.” But what should be new or unusual? The questions? The answers? The methods?

Research is about contributing new knowledge to humankind. After all, it is not really any point in reinventing the wheel. Still, most research is incremental. We all stand on the shoulders of giants. New research questions spring out of the “future work” sections of our colleagues’ articles. Our methods are based on the refinement of disciplinary developments. Even so-called “groundbreaking” projects are incremental in nature if you scrutinize the details. Still, we have an idea that “something unheard of before” is ideal.

Research needs to be solid

My research is creative in both form and content. As such, many people think that my projects are novel in the sense of being new. I also work both multi- and interdisciplinary, which means that I don’t really fit well anywhere. That could also be considered novel in the sense of being unusual. Still, what I am doing is not particularly new or unusual. From my perspective, I am working incrementally—everything I am doing builds on other people’s work. True, I combine theories and methods from different fields. This makes it look novel.

I can illustrate this with a research project I just finished: MICRO. Over the last years, we have studied human music-related micromotion, the smallest actions it is possible to produce and perceive. This is new because no one has studied such motion in a musical context before. It is also unusual because the team comprised researchers from musicology, psychology, human movement science, and computer science.

The MICRO project can be considered novel. However, does that mean that everything we did in the project was novel? Some parts were, I guess. For example, we collected data by running the Norwegian Championship of Standstill annually. This was new and unusual the first time we did it. We even got quite a lot of media interest (it is not so often that music research is featured in the sports news on national TV).

However, collecting data once does not make for outstanding science. Research is about asking questions, finding answers, and verifying those answers. Repeating experiments, making slight modifications to the research design, improving the methods, refining the analyses. That is what solid research is about.

I have researched human music-related micromotion for nearly ten years now. We have some answers, but there are many open questions. Many of these questions are neither new nor unusual any longer. But if we want to understand more about what is actually going on inside our bodies when we experience music, we need to continue researching what is not any longer new and unusual. That is about doing solid research, not novel.

Open Research is better research

I believe that open research is better research. Opening the research process makes researchers think more carefully about what they do and how they document it. This takes (some) more time than working closed. But it also makes it easier for others to understand what has been done. This is important from a peer review perspective. It also facilitates incremental research.

The MICRO project has been an open research flagship project. I began by sharing the funding application openly. Throughout the project, we have continuously described how we have worked. The data has been released in the Oslo Standstill Database, and source code has been shared on GitHub. All of this has taken time “away” from publishing journal articles. However, it is time for researchers to publish fewer articles and focus more on making more data, code, etc., available.

Opening the research process is part of solidifying the research. As researchers, we cannot hide behind a “black box” any longer. Everyone can scrutinize what we have done. In fact, I hope that more people will analyze our data and develop our code. That is part of the incremental nature of science.

Summing up

I am not against novel research. However, I think we have gotten to a point where there is too much focus on novelty. If you are applying for a large research grant, it may make sense to doing something new. But it must be possible to submit a presentation to a conference or a manuscript to a journal based on plain, solid research. That may, in fact, be novel in itself! Hopefully, the transition to open research may actually help to focus more on solidity instead of novelty.

Open Research puzzle illustration

It is challenging to find good illustrations to use in presentations and papers. For that reason, I hope to help others by sharing some of the illustrations I have made myself. I will share them with a permissive license (CC-BY) to be easily reused for various purposes.

I start with the “puzzle” that I often use in presentations about Open Research. It outlines some of the various parts of the research process and how they can be made (more) open. I often think about the blocks as placed on a timeline from left to right.

An English-language version of my Open Research “puzzle.”

We will have a Norwegian conference on Open Research later this year, so I decided to make a version in Norwegian too for the conference page:

A Norwegian-language version of the Open Research “puzzle.”

Feel free to grab the images above. If you are interested in better versions, I have posted PDFs of both versions and the source presentation files (in PPTX and ODP) on GitHub. So head over there to download either the Norwegian or English files.