As part of my duties as a Norwegian member of EUA’s Open Science Expert Group, I was asked to write an “expert voice” on how to think about FAIR data from an educational perspective. Below is a copy of my short article.

How Findable, Accessible, Interoperable and Reusable data enables research-led education

FAIR data is an essential component of the open research ecosystem. In this article, Alexander Refsum Jensenius argues that “FAIRification” can also benefit research-based and research-led education, providing opportunities to bring together different university missions.

Research-based and research-led education

Education is one of the core activities of universities, together with research, innovation, and dissemination activities. Research-based education is one example of how these missions are brought together. Students are taught by active researchers and learn by putting theoretical knowledge into practice in their respective disciplines.

In the best of cases, educational activities are also research-led. Students do not only “passively” learn about research carried out by others; they actively participate in their teacher’s ongoing research or get help to initiate their own projects. This may be more common in graduate-level courses, but the shift towards open research allows for research-led educational components to also be brought to undergraduate students.

Four levels of involvement

Here are four examples of how FAIR (Findable, Accessible, Interoperable and Reusable) data can help develop research-led educational activities.

Explanation - Openly documenting the research process allows students to better understand how the research has been conducted, from the initial research question to the conclusion. An open research practice includes making data available using the FAIR principles. It could also involve sharing notebooks or lab journals used when collecting the data and the annotation schema or source code used in the analysis. Teachers can ask their students to examine the research process in greater detail than has previously been possible. This helps students learn and reflect upon the methods used.

Validation - Instead of only reading results in research articles, students can be asked to validate the methods by checking existing data and related analyses. This is valuable for students but also hugely benefits science at large. Imagine the potential of having hundreds of students check that data is actually understandable by others and that the files in question are interoperable with multiple operating systems and software packages. It is human to make errors, and if a student does indeed find an error, this could be reported to the authors for correction. Such validation could improve science in the long term and help uncover problems not picked up in the peer review process.

Replication - Students can also be asked to replicate findings by performing observation studies or experiments similar to those they have read about in the literature. In some cases, other data may be available to replicate findings. If not, students can be asked to collect and analyse new data to validate the conclusions reported in an article. It may not be possible to replicate with the same level of rigour as the original study. However, that would not be the main point of such a task. The most important thing is that students learn about research methods through practice. Still, the original authors may appreciate hearing about the outcome of replications; it could even inspire follow-up studies. Moreover, if done well, student-based replication studies may, in some cases, be independently publishable.

Repurpose - Another educational activity could be exploring how an openly available dataset can be reused for research purposes other than those for which it was initially collected. Many datasets are only analysed once, yet have the potential to serve as a resource for other research questions. Students typically do not have time to do extensive data collection on their own. As such, it is much better to rely on existing datasets when developing new research questions. Teachers can give their students a FAIR dataset and ask them to analyse it differently from how it was originally used. This may inspire new publishable studies.

Legitimisation of the research process

These are some ideas on how educators can actively use openly available research assets in their teaching. The specific use cases need to be developed from the relevant discipline. There are differences between media-based musicology, corpus-based linguistics, behavioural psychology experiments, and meteorological observations. Still, all these examples are based on data and related analyses that researchers increasingly share openly. Universities should exploit such resources more and better in their educational activities.

Many researchers primarily think about opening their data and other resources for the benefit of their colleagues and peers. If they know that the resources should also be accessible to students, they may need to pay more attention to how everything is explained. This is not a drawback; explaining something so that a wider audience than a researcher’s closest peers understand it is crucial for the general legitimisation of the research process. Ultimately, this is a key argument for opening the research process in the first place.