RDA Fourth Plenary Keynote - Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway" - Monday 22nd Sept 2014, Amsterdam, the Netherlands
https://rd-alliance.org/plenary-meetings/fourth-plenary/plenary4-programme.html
The Research Object Initiative:Frameworks and Use Cases
Semelhante a Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"
Semelhante a Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway" (20)
7. f
2005: Text Mining ?
Why Bury it first and then mine it again !
8.
9.
10. Part II
The Explicitome
and the Elusive Part
(our own fault)
The Explicitome: everything we already asserted
11. The Elusive Explicitome Phenomenon
example from: Yepes & Verspoor, 2013
narrative
Tables/figures
abstract
# of assertions
Supplementary data
5 500* 1000 50K-1M+
# of SNP-Phen: 2% 4% 50%*
The Elusive Explicitome: what escapes us (95%)
Hurdle 1:
Paywalls
Hurdle 2:
‘TIF’walls
Hurdle 3:
The Wall of Broken Links
12. Data loss is real and significant, while data growth is
staggering
Nature news, 19 December 2013 • Computer speed and storage
capacity is doubling every 18
months and this rate is steady
• DNA sequence data is
doubling every 6-8 months
over the last 3 years and looks
‘Oops, that link was the laptop of my PhD student’ to continue for this decade
13. The trends in e-Science
Computer Analytics
(takes charge)
Enormity of datasets
(beyond narrative)
Collaborative Intelligence
(calls for million minds)
Irreversable movement
(towards OA)
FAIR
?
Data
Publishing &
Stewardship
22. FAIR for computers FAIR for people
AERIAL SURVEY
pattern recognition in
Ridiculograms
HUMAN EXCAVATION
rationalisation and
‘confirmational reading’
X
‘Why would I believe this association’???
23. For KD we need each association only once
23
Cardinal Assertion
(<1011)
n identical
assertions
‘n’ different
provenances
24. We publish about less than a million LS Concepts !
24 106 concept clusters (Knowlets)
25. www.biosemantics.org LUMC - LIACS
BioSemantics Knowledge Discovery Pipeline
⊲
data sources ‘coordinated’ data
!
nanopub cache
cardinal
assertion
store
semantic
data
indexing modelling
reasoning
algorithms
trends
phase
transitions
‘new’ data
alerts differentials
{
funding
priorities
• gene
• disease
semantic
query
{
27. Part 3
Unavoidable: some science of ‘our own’
Part IV
Towards Solutions
Bigger is not Better
Zipping the Explicitome
but…..as examples, sorry
28. Electronic
Health
Databases
The Rescued Explicitome
Value
Added
Databases
narrative
Tables/figures
Supplementary data
abstract
PROVENANCE
Total Explicitome
an estimated
1014 asserted associations
in 2,500 data sources
ETL to
FAIR
FAIR
to
read
29. Assertions
Concepts
1014
1011
106
Semantic MedLine
U+C+CT+EG+GO = 36 M
80%
20%
Cardinal
Zipping the Explicitome
30. Part 3
Unavoidable: some science of ‘our own’
Part V
(FAIR) data should take
CENTER STAGE
but…..as examples, sorry
32. PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
A simplified diagram of a Digital (data) Object irrespective of technological choices and naming
33. Digital Object Architecture
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
s are Digital Objects
Some Research Objects Nanopublications are Research Objects
are
34. Data as increasingly FAIR Digital Objects
Totally UNFAIR
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Usable for Humans
PID
Findable
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
PID
FAIR metadata
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
PID
FAIR data-restricted
access
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
FAIR data-
Open Access
PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
Open Access/Functionally Linked
PID
FAIR data-
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
37. Data Owners
(supp)
data
Data
bases
Repositories
FAIRport proof of concept
ELIXIR FAIR Data Search Index
End-users
FAIR L2
ELIXIR
Data
FAIR
Port
ELIXIR federated data
ELIXIR semantic data repository
FAIR L1
Search for
datasets
Download
data (sub)
sets in many
formats (xml,
rdf, json etc)
FAIR
L3
FAIR L4
ASPs, Inhouse IT,
Bioinformatics
Etc..
Tools &
Applications
Elixir
Fin.
Elixir
Esp.
Elixir
Nor.
Elixir
Elixir UK
Elixir SWE
NL..
Elixir
Fin.
Elixir
Esp.
Elixir
Nor.
Elixir
Elixir UK
Elixir SWE
NL..
www.nanopubmed.org
38. Parties needed Typical Candidates NL-example
Tusted Party
Usually Public Sector
With 'data stewardship' mandate
1
Executive Party/
Coordinator
Usually Public or Private Sector
With Expert Knowledge on Project
ans relation management
2
Technology
Providers
3 4 PID/ARTA stewards
DTL/ELIXIR-nl
others
5 DOA architecture/IMS CNRI + EURETOS
6 Publishing pipeline EURETOS
7 Repository Software
8 eInfrastructure
39. Malpractices…….
Journal Impact Factor
Ignore Altmetrics
No data stewardship plan
Obstruct Tenure
Data Experts
‘supplementary data’
Knowledge Sharing Impaired
40. NITRD
FORCE11 ORCID VIVO
4/10/14
EUDAT
40
DATAVERSE
BD2K
DANS
ELIXIR
NIHCom
mons
H2020
DRYAD RDA
FigShare
Nanopub
Biosharing
Elsevier
Science Nature
SageBio
HVP
DataCite
EGA
Reseach Objects
Nebulus
Embassy
SADI
EURETOS
YARCdata
IMI
interoperability
ISA
Open PHACTS
Data Fabric
41. Good practices (apart from collaborating)
‘professional data publishing’
RO Impact Factor
Award Altmetrics
5% for
data stewardship plan
Train & Tenure
Data Experts
FAIR play
43. Endorsed by 82 organisations and [y] individuals
1. FAIR guiding principles with public discussion forum:
https://www.force11.org/group/fairgroup/fairprinciples
2. Notes and Annexes: https://www.force11.org/node/6062/
3. Group home page https://www.force11.org/group/fairgroup
COMMENT: (till October 1st)
ENDORSE: (after October 1st)