The document discusses the Names Project in the UK, which aims to create unique identifiers for UK researchers. It began in 2007 using data from a research assessment exercise. The Names Project takes a hybrid approach, using automated matching and manual disambiguation. It also allows researchers to directly input information. The project seeks to improve data quality and integrate with other national and international identifier systems like ISNI. Key challenges include gaining agreement on national researcher identifier services.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
How dinosaurs broke our system: challenges in building national researcher identifier services
1. How dinosaurs broke our system
Challenges in building national researcher
identifier services
Amanda Hill
Names Project
JISC Conference, 2010
2. Hoping that…
…Simeon has explained all about the name
authority problem
I‟d like to talk about some of the work
that we‟ve done as part of the Names
Project recently…
…and how that fits into today‟s researcher
identification landscape
3. Gross generalisation about past
approaches to author identifiers
Libraries Publishers
Book-level data Article-level data
Labour intensive: Automatically generated:
disambiguation first disambiguation later
Authors not involved Authors can edit
Open Proprietary
4. Current international activity
ISNI ORCID
Library-instigated Publisher-instigated
Disambiguation first Disambiguation later
Authors not involved Authors can submit/edit
Broad scope Current researchers
JISC Conference, 2010
5. Signs of convergence?
Knowledge Exchange meeting on Digital
Author Identifiers in March 2012
encouraged alignment of ISNI and ORCID
approaches
ISNI has reserved a block of identifiers for
use by ORCID
JISC Conference, 2010
6. Sources of information
Both ORCID and ISNI will use existing pools
of information to populate their systems
ISNI: “Leveraging high confidence data from
different domains”
“ORCID will link to other name identifier
systems”
JISC Conference, 2010
7. National author ID systems
2011: JISC-funded survey and report on
national author/researcher identifier
systems around the world
Report published November 2011
http://ie-repository.jisc.ac.uk/567/
8. Maturity of systems (late 2011)
System In development since Number of identities
Lattes (Brazil) 1999 1,600,000
31,000 researchers at 160
Frida/Cristin (Norway) 2003
institutions
24,400 faculty with profiles
VIVO 2003 150,000 total IDs including
undisambiguated co-authors
40,000 in the NTA
Digital Author Identifier 2005 (1980s for National Thesaurus
15,000 researchers with Digital
(Netherlands) of Author Names)
Author IDs
Names Project (UK) 2007 46,000
New Zealand Electronic Text
2007 2,000
Centre
Trove People and
Organisations/NLA Party 2007 900,000 people and organisations
Infrastructure (Australia)
AuthorClaim 2008 200
Researcher Name Resolver
2008 190,000
(Japan)
9. Populating identifier systems
System Records created by Records imported from Records generated by
cataloguers other systems data subjects
AuthorClaim
Digital Author Identifier
(Netherlands)
Frida/Cristin (Norway)
Lattes (Brazil)
Names Project (UK)
New Zealand Electronic Text
Centre
Researcher Name Resolver
(Japan)
Trove People and
Organisations/NLA Party
Infrastructure (Australia)
VIVO
10. Good sources of data for some
nations
National system Existing unique identifiers
Researcher identifiers from national
Japan
researcher databases
Number from National Thesaurus of
Netherlands Author names is converted into
Digital Author Identifier
Human resources data: social security
Norway
numbers
Other national systems assign new
identifiers as new identities are
established.
11. Features of mature national
identifier systems
With more mature systems:
A national organisation generally has oversight: e.g. in
Brazil, Norway, Netherlands
Integration with research funders, reporting agencies
and institutional repositories
Individual institutions also have defined roles
relating to managing information about their own
staff
13. Work to investigate unique IDs
for UK researchers
Identified in 2006 as part of the call for
proposals for the JISC-funded Repositories
and Preservation Programme
Mimas and the British Library proposed a two-
year project to:
Investigate requirements for a UK name authority
service
Build a pilot system to demonstrate potential
14. The Names Project
The Chang Project
„From the Annals of the Onomastic
Society‟
Ian Watson (1990)
15. Names (not an acronym…)
Name Authorities Make Everything Simpler
Names: Ambiguous, Meaningful (or
Meaningless?), Essential, Symbolic
…nearly everyone has a name-related
story
17. Original plan
Use data from British Library‟s Zetoc service to
create author IDs
Journal article information from 1993->
Last names, initials, paper titles, subject
classifications
But…
International in scope
Lack of information on affiliations and first names to
help with making matches
Huge dataset -> processing issues
18. Revised plan
Used 2008 Research Assessment Exercise
data (as cleaned up by JISC Merit project)
to pre-populate the Names system
Identify unique individuals and assign
identifiers
Data quality good, included institutional
information: high accuracy, despite only
having initials, not full first names
Except for…
JISC Conference, 2010
21. Building on Merit…
Merit data covers around 20% of active UK
researchers
Working to enhance records and create
new ones with information from other
sources
Institutional repositories
British Library data sets (Zetoc)
Direct input from researchers
24. Quality matters
Automatic matching can only achieve so
much
Dependent on data source
British Library team perform manual check of
results of matching new data sources
Allows for separation/merging of records
Plan to allow people to update their own
information
25. Ultimate aim
High-quality set of unique identifiers for UK
researchers and research institutions
Available to other systems (national and
international)
e.g. Names records exported to ISNI in 2011
Possible additional services
Disambiguation of existing data sets
Identification of external researchers
26. Access to Names
API allows for flexible searching of Names
data
EPrints plugin released in 2011: allows
repository users to choose from a list of
Names identities
…and to create a Names record if none exists
JISC Conference, 2010
29. Next steps…
JISC-convened Researcher ID group – final
meeting in September > recommendations
Options Appraisal Report for UK national
researcher identifier service > December
Improving data and adding new records
JISC Conference, 2010
30. Summing up
Names is a hybrid of library/publisher
approaches
Automated matching/disambiguation
Human quality checks
Data immediately available for re-use in other
systems
Researchers can supply information
31. An evolving area
Main challenges are cultural and political
rather than technical
National author/researcher ID services can be
important parts of research infrastructure
Getting agreement and co-ordination at
national level is vital
…and, I would say, are all very jealous of those countries with ready-made data sources like this…
Namey anecdote here? Dicky Moore & Robin Armstrong Viner?
Known in name authority circles as ‘the Siveter problem’
Every time we add a new data set, the quality of the data within the Names pilot improves – recently added information from the University of the West of England – QA process highlighted a previously unnoticed problem with the original Merit data.