The Open Graph standard has helped “automagically” collect information about partner events on MishMash.no. Now, we have started building a “who’s who” directory, and I have begun looking into how we can pre-populate pages from existing academic identity sources rather than asking everyone to fill out web forms.
My first inclination was to look at what is available on institutional websites. Most researchers have at least one institutional personal website. In this blog post, I look at what can be retrieved from my UiO page.
What can be extracted from a UiO personal page
Unfortunately, unlike UiO’s event pages, there is no relevant Open Graph metadata available on my personal page. There is a lot of information available in the main body of the page, of course, including:
- Identity and role (Name, Roles, Profile image)
- Contact and location (Email, Phone, Office)
- Profile content (Biography, Teaching topics, Highlighted publications)
- Links and tags
However, that would require setting up a parser of the HTML code. That might work across all UiO pages, and for other institutions using the Vortex content management system, but it is not a viable solution for a general-purpose content retrieval system for MishMash.
Comparing UiO pages with NVA and ORCID
There are two other good sources of information, however: NVA and ORCID. Asking CoPilot to compare them, it suggests the following differences:
| Capability | UiO employee page | NVA | ORCID |
|---|---|---|---|
| Human-readable biography | Strong | Medium | Strong |
| Machine-readable structure | Low to medium | Strong | Strong |
| Stable public API | No (HTML scraping) | Yes | Yes |
| Local contact details (room/phone) | Strong | Medium | Weak |
| Institutional roles/affiliations | Strong | Strong | Medium |
| Projects (local relevance) | Strong | Medium to strong | Weak |
| Publication integration | Medium (selected/curated) | Strong | Strong |
| Global portability | Medium | Medium | Strong |
| Parser maintenance burden | High | Low | Low |
All in all, it looks like NVA and ORCID are better backbone sources for reliable, scalable automation due to their stable, structured APIs. I will explore that in the coming days.
Thanks to CoPilot for helping with the research and drafting this comparison.
