How long is a NIME paper?

Several people have argued that we should change from having a page limit (2/4/6 pages) for NIME paper submissions to a word limit instead. It has also been argued that references should not be counted as part of the text. However, what should the word limits be?

It is always good to look at the history, so I decided to check how long previous NIME papers have been. I started by exporting the text from all of the PDF files with the pdftotext command-line utility:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; pdftotext "$i" "${name}.txt"; done

Then I did a word count on these:

wc -w *.txt > wc.txt

And after a little bit of reformatting and sorting, this ends up like this in a spreadsheet format:

And from this we can sort and make a graphical representation of the number of words:

There are some outliers here. A couple of papers are (much) longer than the others, mainly because they contain long appendices. Some files have low word count numbers because the PDF files are protected from editing, and then pdftotext is not able to extract the text. The majority of files, however, are in the range 2500-5000 words.

The word count includes everything, also headers/footers, titles, abstracts, acknowledgements, and references. These differ, but the total words used for these things are 2000-5000 words. So the main text of most papers could be said to be in the range of 2000-4500 words.

Improving the PDF files in the NIME archive

This blog post summarizes my experimentation with improving the quality of the PDF files in the proceedings of the annual International Conference on New Interfaces for Musical Expression (NIME).

Centralized archive

We have, over the last few years, worked hard on getting the NIME adequately archived. Previously, the files were scattered on each year’s conference web site. The first step was to create a central archive on nime.org. The list there is automagically generated from a collection of publicly available BibTeX files that serve as the master document of the proceedings archive. The fact that the metadata is openly available in GitHub makes it possible for people to fix errors in the database. Yes, there are errors here and there, because the files were made by “scraping” the PDF content. It is not just possible to do this manually for more than 1000 PDF files.

The archive points to all the PDF files, some media files (more are coming), and DOIs to archived PDFs in Zenodo. Together, this has turned out to be a stable and (we believe) future-proof solution.

PDF problems

However, as it has turned out, the PDF files in the archive have various issues. All of them work fine in regular PDF readers, but many of them have accessibility issues. There are (at least) three problems with this.

  1. Non-accessible PDFs do not work well for people using alternative readers. We need to strive for universal access at NIME, and this includes the archive.
  2. The files are not optimized for text mining tasks. The latter is something that more people have been interested in. Such an extensive collection of files is a great resource when it comes to understanding a community and how it has developed. This was something I tried myself in a NIME paper in which I analyzed the use of the word “gesture” in all NIME papers up until 2013.
  3. If machines have problems with the files, so have the Google crawlers and other robots looking at the content of the files. This, again, has implications for how the files can be read and indexed in various academic databases.

It is not strange that there are issues with the files. After all, there are a total of 1728 of them. They have been produced from 2001 and until today on a myriad of different types of OSes and software. During this time, the PDF standard itself has also evolved considerably. For that reason, we have found it necessary to do some optimization of the files.

Renaming

The first thing I did was to download the entire collection of PDFs. I quickly discovered that there were some inconsistencies in the file names. We did a large cleanup of the file names some years ago, so things were not entirely bad. But it was still necessary to clean up the file names to have one convention. Here I ended up with renaming everything to a pattern like:

nime2001_paper001.pdf

This makes it possible to sort by year first, and then the submission type (currently only paper and music, but could be more) and then a three-digit unique number based on the submission number. Not all the numbers had leading 0’s, so I added this as well for consistency. Since the conference year and ID are unique, it is easy to do query-replace in the BibTeX database to correct the links there.

Acrobat testing

I usually don’t work much in Acrobat these days, but decided to start my testing there. I was able to get access to a copy of Acrobat XI on a university machine and started looking into different options. From the list of batch processes available, I found these to be particularly promising:

  • “Optimize scanned documents” (converting content into searchable text and reducing file size)
  • “Prepare for distribution” (removing hidden information and other oddities)
  • “Archive documents” (create PDF/A compliant documents)

I first tried to run a batch process using OCR. The aim here was to see if I could retrieve some text from files with images containing text. This did not work particularly well. It skipped most files and crashed on several. After the tenth crash, I gave up and moved on.

The “prepare for distribution” option worked better. It ran through the first 300 files or so with no problems, and reduced the files properly. But then the problems started. For many of the files, it just crashed. And when I came to the 2009-files, they turned out to the protected from editing. So I gave up again.

Finally, I tried the archiving function. Here it popped up a dialogue box asking me to fill in title and authors for every single file. I agree that this would be nice to have, but I do not have time to do this manually for 1728 files.

All in all, my Acrobat exploration turned out quite unsuccessful. Therefore, I went back to my ubuntu machine and decided to investigate what type of command-line tools I could use to get the job done.

File integrity

After searching some forums about checking if PDF files are corrupted I came across the useful qpdf application. Running this on the original NIME collection showed that the majority of files had issues.

find . -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)

The check showed that only 794 of the files were labeled as OK, while the others (934) failed. I looked at the failing files, trying to figure out what was wrong. However, I have been unable to find any consistency among failing or passing files. Initially, I thought that there might be differences based on whether they were made in LaTeX or MSWord (or something else), the platform, etc. But it turns out to not be that easy. This may also be because many of the files have been through several steps of updating along the way. For example, for many of the NIME editions, the paper chairs have added page numbers, watermarks, and so on.

Rather than trying to fix the myriad of different problems with the files, I hoped that a file compression step and saving with a newer (and common) PDF format version could help the problem.

File compression

Several of the files were unnecessarily large. Some files were close to 100 MB, and too many were more than 2 MB. This should not be necessary for 4-6 page PDF files. Large files cause bandwidth issues on the server, which means extra cost for the organization and long download time for the user. Although we don’t think about it much, saving space also saves energy and helps reduce our carbon footprint on the planet.

To compress the PDF files, I turned to the convert command line utility, which is part of the Ghostscript family. I experimented with different types of settings, but found that the settings “Screen” and “Ebook” rendered the images pixelated, even on screen. So I went for the “Printer” version, which according to the ghostscript manual should mean a downsampling of images to 300 DPI. This means that they should also print well. The script I used was this:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`; gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.6 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile="${name}_printer.pdf" "$i"; done

The result was that the folder shrank from 3.8GB to 1.0GB, a quite lovely saving. The image quality also appears to be more or less preserved. However, this is only based on visual inspection of some of the files.

Re-running the file integrity check on all these new files, showed that all 1728 files now passed the check!

PDF/A

I have been working with PDF files for years but had not really read up on the details of the different versions. What turns out to be important when it comes to longterm preservation of files, is that they comply with the PDF/A standard. The regular PDF format has different versions (1.4, 1.5, 1.6) but these are proprietary. However, PDF/A is an ISO standard and appears to be what people use for archiving.

Unfortunately, it turns out that creating PDF/A files using Ghostscript is not entirely straightforward. So more exploration needs to be done there.

Metadata

Finally, one of the problems with the proceedings archive is to get it properly indexed by various search engines. Then having PDF metadata is important. Again, I wish we had the capacity to do this properly for all 1728 files, but that is currently out of the scope.

However, adding some general metadata is better than nothing, so I found the function ExifTool, which can be used to set the metadata on PDF files:

for i in *.pdf; do name=`echo $i | cut -d'.' -f1`;  exiftool -Title="Proceedings of the International Conference on New Interfaces for Musical Expression" -Author="International Conference on New Interfaces for Musical Expression" -Subject="A Peer Reviewed article presented at the International Conference on New Interfaces for Musical Expression" "$i"; done

Conclusion

I still need to figure out the PDF/A issue (help wanted!), but the above recipe has helped in improving the quality of the PDF files considerably. It will save us bandwidth, improve accessibility, and, hopefully, also lead to better indexing of the files.

NIME Publication Ecosystem Workshop

During the NIME conference this year (which as run entirely online due to the coronavirus crisis), I led a workshop called NIME Publication Ecosystem Workshop. In this post, I will explain the background of the workshop, how it was run in an asynchronous+synchronous mode, and reflect on the results.

If you don’t want to read everything below, here is a short introduction video I made to explain the background (shot at my “summer office” up in the Hardangervidda mountain range in Norway):

Background for the workshop

The idea of the NIME Publication Ecosystem Workshop was to continue community discussions started in the successful NIMEHub workshop in Brisbane in 2016 and the Open NIME workshop in Porto Alegre in 2019. Besides, comes discussions about establishing a NIME journal, as well as better solutions to archive various types of NIME-related activities.

The term “publication” should in this context be understood in a broad sense, meaning different types of output of the community, including but not limited to textual productions. This is particularly important at NIME since this community consists of people designing, building, and performing new musical interfaces.

When I gathered a workshop team and proposed the topic back in January, this was mainly coming out of the increasing focus on Open Research. Please note that I use “open research” here, not “open science”, a significant difference that I have written about in a previous blog post. The focus on more openness in research has recently received a lot of political attention through the Plan S initiative, The Declaration on Research Assessment (DORA), EU’s Horizon Europe, funder’s requirements of FAIR data principles, and so on.

Of course, the recent coronavirus crisis has made it even more necessary to develop open research strategies, as well as finding alternative ways of communicating our research. This also includes rethinking the format of conferences. The need to travel less is not something that will go away after the coronavirus crisis calms down, however. Long-term change is necessary to reduce problems with climate change. While such a move may feel limiting to some of us that could travel to international conferences every year, it also opens possibilities for many others to participate. The topic of this year’s NIME conference was “accessibility,” and it turned out that the virtual conference format was, indeed, one that promoted accessibility in many ways. I will write more on that in another blog post.

When it comes to openness in research, this is something the NIME community has embraced since the beginning. The paper proceedings, for example, have been freely available online all the time. Also, the database of the archive has been made available as a collection of BibTeX files. Some people don’t understand why we do this, but opening up also the metadata for the archive makes it much more flexible to integrate with other data sources. It also makes it much easier to research the community’s output.

Despite these efforts, there are also several things about the NIME conference that we have not been able to make openly available, such as hardware designs, code, empirical data, music performances, installations, and so on. This is not that we don’t want to, but it has proven hard to find long-term solutions that are maintainable by a volunteer-driven community. People in the community have different interests and skills, so it is essential to find solutions that are both innovative and user-friendly at the same time. The longevity of chosen solutions is also important since NIME is central to an increasing number of people’s careers. Hence, we need to balance the exploration of new solutions with the need for preservation and stability. 

In addition to finding solutions for the NIME conference itself, the establishment of a NIME journal has been discussed for several years. This discussion has surfaced again during the testing of a new paper template for the conference. But rather than thinking about the conference proceedings and a journal as two separate projects, one could imagine a broader NIME publication ecosystem that could cover everything from draft manuscripts, complete papers, peer-reviewed proceedings papers, and peer-reviewed journal papers. This could be thought of as a more “Science 2.0”-like system in which the entire research process is open from the beginning.

The aims of the workshop were therefore to:

  1. discuss how a broader publication ecosystem built around (but not limited to) the annual conference could work
  2. brainstorm and sketch concrete (technical) solutions to support such an idea
  3. agree on some concrete steps on how to proceed with the development of such ideas the coming year

Workshop format

We had initially planned to have a physical workshop in Birmingham but ended up with an online event. To make it as accessible as possible, we decided to run it using a combination of asynchronous and synchronous delivery. This included the preparation of various types of introductory material by the organizing committee and some participants. All of this material was gathered into a pre-workshop padlet, which was sent to the participants some days before the online workshop.

The synchronous part of the workshop was split over two-hour-long time slots. We ended up doing it like this to allow people from all time zones to participate in at least one of the workshops. Since most of the organizers were located in Europe, and the conference itself was scheduled around UK time, we ended up with one slot in the morning (9-10 UK time) and one in the afternoon (17-18 UK time). The program for each of the slots was the same so that everyone would feel that they participated equally in the event.

Around 30 people showed up for each time slot, with only a few participating in both. Since preparatory material was distributed beforehand, most of the online workshop time consisted of discussions in breakout rooms with 5-6 people in each group. The groups wrote their feedback into separate padlets and also reported back in a short plenary session at the end of the hour-long session.

A post-workshop padlet was created to gather links after the workshop. The topic was also lively discussed in separate threads on the Slack channel that was used during the conference. After the conference, and as a result of the workshop, we have established a forum on nime.org, with a separate ecosystem thread.

All the pre- and post-workshop material from the workshop has been archived in Zenodo.

Conclusions

It is, of course, impossible to conclude such a vast topic after one workshop. But what is clear is that there is an interest in the community to establish a more extensive ecosystem around the NIME conference. The establishment of a forum to continue discussion is one concrete move ahead. So is the knowledge gained from running a very successful online conference this year. This included pre-recorded talks, written Q&A in Slack channels, plenary sessions, and breakout rooms. A lot of this can also be archived and be part of an extended ecosystem. All in all, things are moving in the right direction, and I am very excited to see where we end up!

Workshop: Open NIME

This week I led the workshop “Open Research Strategies and Tools in the NIME Community” at NIME 2019 in Porto Alegre, Brazil. We had a very good discussion, which I hope can lead to more developments in the community in the years to come. Below is the material that we wrote for the workshop.

Workshop organisers

  • Alexander Refsum Jensenius, University of Oslo
  • Andrew McPherson, Queen Mary University of London
  • Anna Xambó, NTNU Norwegian University of Science and Technology
  • Dan Overholt, Aalborg University Copenhagen
  • Guillaume Pellerin, IRCAM
  • Ivica Ico Bukvic, Virginia Tech
  • Rebecca Fiebrink, Goldsmiths, University of London
  • Rodrigo Schramm, Federal University of Rio Grande do Sul

Workshop description

The development of more openness in research has been in progress for a fairly long time, and has recently received a lot of more political attention through the Plan S initiative, The Declaration on Research Assessment (DORA), EU’s Horizon Europe, and so on. The NIME community has been positive to openness since the beginning, but still has not been able to fully explore this within the community. We call for a workshop to discuss how we can move forward in making the NIME community (even) more open throughout all its activities.

The Workshop

The aim of the workshop is to:

  1. Agree on some goals as a community.
  2. Showcase best practice examples as a motivation for others.
  3. Promote existing solutions for NIME researcher’s needs.
  4. Consider developing new solutions, where needed.
  5. Agree on a set of recommendations for future conferences, to be piloted in 2020.

Workshop Programme

TimeTitleResponsible
11:30WelcomeIntroduction of participantsIntroduction to the topicAlexander Refsum Jensenius

11:45Open Publication perspectivesAlexander Refsum JenseniusDan OverholtRodrigo Schramm
12:15Group-based discussion:How can we improve the NIME publication template?Should we think anew about the reviewing process (open review?)Should we open for a “lean publishing” model?How do we handle the international nature of NIME?
12:45Plenary discussion
13:00Lunch Break
14:30Open Research perspectivesGuillaume PellerinAnna XambóAndrew McPhersonIvica Ico Bukvic
15:00Group-based discussion:What are some best practice Open NIME examples?What tools/solutions/systems should be promoted at NIME?Who should do the job?
15:30Final discussion
16:00End of workshop

Background information

The following sections present some more information about the topic, including current state of affairs in the field.

What is Open Research?

There are numerous definitions of what Open Research constitutes. The FOSTER initiative has made a taxonomy, with these overarching branches :

  • Open Access: online, free of cost access to peer reviewed scientific content with limited copyright and licensing restrictions.
  • Open Data: online, free of cost, accessible data that can be used, reused and distributed provided that the data source is attributed.
  • Open Reproducible Research: the act of practicing Open Science and the provision of offering to users free access to experimental elements for research reproduction.
  • Open Science Evaluation: an open assessment of research results, not limited to peer-reviewers, but requiring the community’s contribution.
  • Open Science Policies: best practice guidelines for applying Open Science and achieving its fundamental goals.
  • Open Science Tools: refers to the tools that can assist in the process of delivering and building on Open Science.

Not all of these are equally relevant in the NIME community, while others are missing.

Openness in the NIME Community

The only aspect that has been institutionalized in the NIME community is the conference proceedings repository. This has been publicly available from the start at nime.org, and in later years all publications have also enforced CC-BY-licensing.

Other approaches to openness are also encouraged, and NIME community members are using various types of open platforms and tools (see the appendix for details):

  • Source code repositories
  • Experiment data repositories
  • Music performance repositories
  • MIR-type repositories
  • Hardware repositories

The question is how we can proceed in making the NIME community more open. This includes the conferences themselves, but also other activities in the community. A workshop on making hardware designs openly available was held in connection to NIME 2016 , and the current project proposal may be seen as a natural extension of that discussion.

The Problem with the Term “Open Science”

Many of the initiatives driving the development of more openness in research refer to this as “Open Science”. In a European context this is particularly driven by some of the key players, including the European Union (EU), the European Research Council (ERC), and the European University Association (EUA). Consequently a number of other smaller institutions and individuals are also using the term, often without thinking very much about the wording.

The main problem with using Open Science as a general term, is that it sounds like this is not something for researchers working in the arts and humanities. This was never the intention, of course, but was more the result of the movement developing from the sciences, and it is difficult to change a term when it has gotten some momentum.

NIME is—and is striving to continue to be—an inclusive community of researchers and practitioners coming from a variety of backgrounds. Many people at NIME would not consider that they work (only) in “science”, but would perhaps feel more comfortable under the umbrella “research”. This term can embrace “scientific research”, but also “artistic research” and R & D found outside of academic institutions. Thus using the term “Open Research” fits better for the NIME community than “Open Science”.

Free

The question of freedom is also connected to the that of openness. In the world of software development, one often talks about “free as in Speech” (libre) or “free as in Beer” (gratis). This point also relates to issues connected to licensing, copyright and reuse. Many people in the community are not affiliated with institutions, and receive payment from their work. Open research might have a close connection with open source, open hardware and open patent. This modern context for research and development of new musical technologies are also beyond academia and must be well planned in order to also attract the industry as partners. How can this be balanced with the needs for openness?

FAIR Principles

Another term that is increasingly used in the community is that of the FAIR principles, which stands for Findable, Accessible, Interoperable and Reusable. It is important to point out that FAIR is not the same as Open. Even though openness is an overarching aim, there is an understanding that privacy matters and copyright issues are preventing general openness of everything. Still the aim is to make data as open as possible, as closed as necessary. By applying the FAIR principles, it is possible to make metadata available so that it is openly known what types of data exist, and how to ask for access, even though the data may have to be closed.

General Repositories

There are various “bucket-based” repositories that may be used, such as:

What is positive about such repositories is that you can store anything of (more or less) any size. The challenge, however, is the lack of specific metadata, specialized tools (such as visualization methods), and a community.

There are also specific solutions, such as Github for code sharing.

As of 2018 a new repository aimed at coupling benefits of the aforesaid “bucket-based” approach with a robust metadata framework, titled COMPEL, has been introduced. It seeks to provide a convergence point to the diverse NIME-related communities and provide a means of linking their research output.

Openness in the Music Technology community

Looking at many other disciplines, the music technology community has embraced open perspectives in many years. A number of the conferences make their archives publicly available, such as:

There are also various types of open repositories and tools, including:

Best Practice Examples

  • CompMusic as a best practice project in the music technology field 
  • COMPEL focuses on the preservation of reproducible interactive art and more specifically interactive music
  • Bela platform

NIME publication: “NIME Prototyping in Teams: A Participatory Approach to Teaching Physical Computing”

The MCT master’s programme has been running for a year now, and everyone involved has learned a lot. In parallel to the development of the programme, and teaching it, we are also running the research project SALTO. Here the idea is to systematically reflect on our educational practice, which again will feed back into better development of the MCT programme.

One outcome of the SALTO project, is a paper that we presented at the NIME conference in Porto Alegre this week:

Xambó, Anna, Sigurd Saue, Alexander Refsum Jensenius, Robin Støckert, and Øyvind Brandtsegg. “NIME Prototyping in Teams: A Participatory Approach to Teaching Physical Computing.” In Proceedings of the International Conference on New Interfaces for Musical Expression. Porto Alegre, 2019.

MCT at NIME
Anna Xambó presents the paper “NIME Prototyping in Teams: A Participatory Approach to Teaching Physical Computing” at NIME 2019.

Abstract:

In this paper, we present a workshop of physical computing applied to NIME design based on science, technology, engineering, arts, and mathematics (STEAM) education. The workshop is designed for master students with multidisciplinary backgrounds. They are encouraged to work in teams from two university campuses remotely connected through a portal space. The components of the workshop are prototyping, music improvisation and reflective practice. We report the results of this course, which show a positive impact on the students on their intention to continue in STEM fields. We also present the challenges and lessons learned on how to improve the teaching and delivery of hybrid technologies in an interdisciplinary context across two locations, with the aim of satisfying both beginners and experts. We conclude with a broader discussion on how these new pedagogical perspectives can improve NIME-related courses.