I am honoured to have been named European Open Data Champion by SPARC Europe. Below is an interview they have published about my research, written by Margaret Louise Fotland.

Why are you keen to share your data?

First of all, because it is impossible to validate research findings without having access to the data, methods and tools that were used to generate those findings. As such, a research paper without accompanying data is, in my opinion, incomplete. Another important reason is that of making data available for reuse. A lot of data is only analysed from one particular angle, and could benefit from being treated with other methodologies or coupled with other data. The increasing use of machine learning methods is a positive driving force here, since such methods require large data sets to work well. Few researchers are able to generate massive amounts of (meaningful) data by themselves, but together we can build really exciting databases.

How are you involved with Open Data?

I am trying to share data, but it is not easy. As a music researcher, my datasets are fairly large and complex. We work with multichannel audio and video recordings, full-body motion capture and physiological sensor data, as well as both qualitative and quantitative questionnaires and interviews. We are struggling to find good ways of synchronising and storing all this data for our own usage, particularly since a lot of the data are coming from proprietary systems. This makes it hard to share our datasets in a coherent, consistent and well-documented manner. Putting lots of files on a server is not sufficient; the data need to have proper metadata to be useful for others. I am delighted to see the growing number of data-sharing repositories popping up these days, but I think we should also question the data quality in some of these repositories. We also risk ending up in the same situation that we have with publications – a few commercial players dominating the market.

Acknowledging these problems, I am currently involved in several different initiatives on structuring multimodal/multimedia data and related metadata. There are numerous technical challenges, but we are also dealing with questions related to privacy and copyright. It is at times frustrating, I have to admit. I am primarily interested in discovering new things about how we experience music, but I spend a lot of time on technical and legal issues. Looking 10 years back, though, I am happy to see that things have progressed a lot!

To me it does not make sense to talk about Open Data without also mentioning the other parts of Open Science. If someone wants to check my research findings, they need access to both the data and the analysis software (hence, methodology). That is why I share my software code on GitHub (https://github.com/alexarje/). I also choose gold open access journals as often as possible, and self-archive manuscripts in the digital library of the University of Oslo (DUO – https://www.duo.uio.no/).

How do you get others to share their data?

I am constantly talking (positively!) about Open Science to colleagues and students, and I see that focusing on the topic has an effect. Even people that used to be sceptical are now interested in hearing about my experiences. I also focus on Open Science in the various organisations I am involved in. For example, as Chair of the Steering Committee for the International Conference on New Interfaces for Musical Expression (www.nime.org), I am pushing for new ways to publish data alongside papers and code. It is particularly important in this community, since there are a lot of independent researchers and artists involved that do not have any institutional support.

What still needs to be done to get more to share, and what makes you optimistic about the future of Open Science?

While much progress has been made, there are still numerous problems to solve before Open Data is the first choice for many researchers. This includes storage solutions that are as simple to use as popular commercial applications, but with the privacy and copyright regulations that are required for research. Standardising on Open Data formats is also critical, since proprietary and/or closed formats flourish in the research community. Developing better solutions for efficiently adding metadata is also vital. Here we need more collaboration between librarians, IT engineers and research administrators. Fortunately, there are currently a number of initiatives focussed on handling the above-mentioned issues. My wish is that we end up with a set of standardised yet flexible solutions that are open enough to allow for all sorts of weird things in the future.

Finally, what do you think would happen if public research data remained closed, and what do you think a world with far more Open Data would look like?

At best, closed research is unfortunate. At worst, it could be directly harmful. It also slows down and hinders innovation. I truly believe that open research is the future!

Is there anything else you’d like to add?

While the political fight for Open Access publishing has more or less been won, we are currently at a critical point when it comes to more wide-spread implementation. The economic side of Open Access publishing is still in transition, and there is some resistance among individuals and at departmental levels. There is a paradox that many young researchers are among the most willing to adapt to Open Access, but they are also the most vulnerable.

I find the conservatism towards Open Science displayed in many committee reports and anonymous peer reviews, troublesome. That is why I advocate more openness also in such processes. At UiO we make all appointment processes as transparent as possible. All candidates for jobs can read the evaluation of each other, to check that the evaluation is not biased. I have also suggested a system of “Open Funding” to the Research Council of Norway. That is, both the applications and the reviews would be made available for everyone to read. People worry about someone stealing their idea, but this is not a problem if all published documents have DOIs, time-stamping and version control. I am sure that such a system would lead to better applications, better evaluation processes and, ultimately, better projects.