After concluding that it is not viable to use institutional person pages to build a “Who’s Who” directory for MishMash, I yesterday found that NVA can be a good solution. However, it would only cover affiliated (Norwegian) researchers, which may be too restrictive for MishMash, where we also want to list non-academic, non-affiliated, and international researchers. Then, ORCID may be a better solution. This is an international registry where researchers can register themselves (check my ORCID profile). However, what information is available there and how can it be retrieved?

Retrieving ORCID data

Similar to NVA, the ORCID website is also a JavaScript single-page application. So there’s no simple metadata to collect from the HTML header. Fortunately, ORCID does support proper content negotiation. Sending Accept: application/json to the profile URL redirects you to the public API:

GET https://orcid.org/0000-0001-6171-8743
Accept: application/json
→ 302 → https://pub.orcid.org/v3.0/0000-0001-6171-8743

And the nice thing is that no API key is required for public data. The /person and /record endpoints return well-documented JSON.

From the /person endpoint, the following fields are publicly available (for my own record):

FieldExample value
given-namesAlexander Refsum
family-nameJensenius
biographyFull biography text (~200 words)
keywords4 entries
researcher-urls4 entries (personal website, lab, etc.)
addresses1 entry (country)
emails0 public entries
external-identifiers3 entries (Scopus, ResearcherID, etc.)

From the /record endpoint, the activities-summary adds:

Activity typeCount in my record
Works (publications)311 groups
Employments1
Educations1
Fundings0
Peer reviews0

All of this is fetched with a single HTTP request per person.

What this means for the MishMash directory

For our planned “who’s who” directory, we could use the ORCID API as a foundation for member pages. Given an ORCID iD, we can automatically pull in:

  • Full name
  • Biography text
  • Keywords/research interests
  • Institutional affiliations (from employment)
  • Links to personal/lab websites
  • External identifier links (Scopus, ResearcherID, Google Scholar if added)
  • Potentially a list of selected works

The things we cannot get from ORCID alone, and will need to add manually, are a photo, project-specific roles, and any tags or categories we want to use for filtering in the directory. However, that information could be collected from NVA, which is a richer (and perhaps more up-to-date?) registry.

How to build

An important consideration is when and how to fetch data. For a static site like the MishMash web page (built in Jekyll), build time makes the most sense. It is then possible to run a small Python script that queries pub.orcid.org, populates a JSON file and uses that data with a template to build the site. But should it be done at each run, once in a while, or manually? And how should we handle conflicts between the data sources? More on that in a future post.