10-min presentation on the author name problem, the ORCID initiative and two key applications of digital identity for researchers: i) attribution for data publication and ii) access control for sensitive research data
Who’s who on the Web? - Digital identity for tracking contributions and managing data access
1. G. A. Thorisson www.gen2phen.org
Who’s who on the Web?
Digital identity for tracking contributions
and managing data access
Gudmundur A. Thorisson <gt50@le.ac.uk>
Brookes lab
Department of Genetics
University of Leicester, UK
-- Outline --
• The author name problem and the challenge of attribution
• Unique identifiers for authors - the ORCID initiative
• A digital identity on the Internet - IDs for researchers?
• Introduce several identity-based projects in our group
• Attribution for data publications
• Access control for sensitive data
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 1
2. G. A. Thorisson www.gen2phen.org
Non-unique names are a major
problem in the scholarly literature
Are these authors all the same person?
G. Thorisson, Univ. Leicester
G. A. Thorisson, Univ. Leicester
G. A. Thorisson, Cold Spring Harbor Laboratory
How about these? Or these?
J. Smith
J. Smith
J. Smith
J. Smith
J. Smith
[etc.]
∼2/3 of the ∼6 million authors in MEDLINE share a last name and first
initial with at least one other author, and an ambiguous name refers
to ∼8 persons on average.
Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on
Knowledge Discovery from Data (2009) vol. 3 (3)
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 2
3. G. A. Thorisson www.gen2phen.org
Unique identifiers for authors contributors
automated author disambiguation
+
author involvement
Dec’09: launch of the Open Researcher ORCID ID: B-1242-2010
Contributor Identification Initiative - ORCID G. Thorisson, Univ. Leicester
G. A. Thorisson, Univ. Leicester
G. A. Thorisson, Cold Spring Harbor
ORCID ID: G-1442-2009
J. Smith, Univ. North Pole
ORCID ID: D-2400-2010
J. Smith, Luthor Corporation
Informatics infrastructure:
i) for researchers to manage profile
i) interaction with other systems
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 3
4. G. A. Thorisson www.gen2phen.org
• =
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 4
5. G. A. Thorisson www.gen2phen.org
Digital identity - an online ‘presence’
Identity = The collective aspect of the set of characteristics by which a
thing is definitively recognizable or known (dictionary.com)
Self-asserted Asserted by others
Nature Rev Genet
G. A. Thorisson <authored> Nat Biotechnol (2009) vol. 27 (11)
Nature Biotech
G. A. Thorisson <authored> Nat Rev Genet (2009) vol. 10 (1)
[...]
Nature Rev Genet
B-1242-2010 <authored> Nat Biotechnol (2009) vol. 27 (11)
Nature Biotech
B-1242-2010 <authored> Nat Rev Genet (2009) vol. 10 (1)
[...]
Problem: the Web ‘me’ is fragmented
Many usernames/passwords
across many identity silos
- password fatigue -
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 5
6. G. A. Thorisson www.gen2phen.org
Bridging the silos - distributed identity systems
-Federated authentication-
User-centric
identity
Institution-centric Open-world: SSO
user gt50 at University of Leicester identity across the entire Web
Closed-world: Single sign-on Web 2.0
(SSO) across the federation social networking
-reuse profile information
- username/pwd problem
Open, lightweight
decentralised
authentication protocol
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 6
7. G. A. Thorisson www.gen2phen.org
>50,000 websites now accepting
3rd party IDs (from www.janrain.com)
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 7
8. G. A. Thorisson www.gen2phen.org
A digital identity for
researchers centred on
scholarly profile?
ORCID ID: B-1242-2010
G. Thorisson, Univ. Leicester
G. A. Thorisson, Univ. Leicester
G. A. Thorisson, Cold Spring Harbor Lab.
http://mummi.myopenid.com
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 8
9. G. A. Thorisson www.gen2phen.org
Data-related applications for digital
identity (a.k.a ‘researcher IDs’)
• Identity-enabled access management
– Controlling access to protected resources on the Web
• Tracking contributions
– data submissions to central repositores
– data curation / micro-attribution
– crucial to bio-resource impact factor + nanopublications
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 9
10. G. A. Thorisson www.gen2phen.org
Cafe RouGE for mutation data exchange
Owen Lancaster, Session 10, Thu 10:10am
2. Central
1. Diagnostic 3. End-users (e.g.
mutation
laboratories LSDB curators)
depot
Publish data
Retrieve RSS feeds
•Security
•Log in via OpenID and other 3rd party identity (aka ‘outsourced’ security)
•Identity-based management of access to non-open data
•Attribution
•Link ORCID ID with variation data submissions => publication credit
•B-1242-2010 <authored> doi:10.9354/caferouge0005 (Cafe RouGE entry 0005, 19/12/2009)
DOIs for scholarly publications
vs “Our long term vision is to support researchers by
DOIs for datasets providing methods for them to locate, identify, and
cite research datasets with confidence.”
http://www.datacite.org
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 10
11. G. A. Thorisson www.gen2phen.org
HGVbaseG2P - a central
genetic association database
• Controlled access to aggregate GWAS datasets
http://www.hgvbaseg2p.org Robert Free, Session 10, Thu 11:30am
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 11
12. G. A. Thorisson www.gen2phen.org
Molgenis-based distributed
G2P data infrastructure
• Identity-based access control
+
attribution
http://www.molgenis.org Morris Swertz, Session 10, Thu 9:40am
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010
12
13. G. A. Thorisson www.gen2phen.org
The GEN2PHEN Strategy
A federated network of databases, not
a central database
Data import systems
Data standards
GEN2PHEN
Database solutions Knowledge
Centre
Locus-specific databases
‘Genomics’ G2P databases
Data exchange mechanisms
The GEN2PHEN Knowledge Centre will
fully leverage this network
http://www.gen2phen.org
Adam Webb, Session 10, Thu 9:40am
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 13
14. G. A. Thorisson www.gen2phen.org
Acknowledgements
This work has received funding from
the European Community's Seventh
Framework Programme
(FP7/2007-2013)
GEN2PHEN Consortium
under grant agreement number
http://www.gen2phen.org/about-gen2phen/partners 200754 - the GEN2PHEN project.
University of Leicester
Anthony J. Brookes Bioinformatics Group
Tim Beck
Rob Free
Sirisha Gollapudi
Rob Hastings
Owen Lancaster Contact me!
Adam Webb Gudmundur A. Thorisson <gt50@le.ac.uk>
Raymond Dalgleish
3rd Human Variome Project Meeting, Paris, 10-14 May, 2010 14