SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
11/13/20
1
FAIR Data Require Better Metadata:
The Case for CEDAR
Mark A. Musen, M.D., Ph.D.
and the CEDAR Team
Stanford University
musen@Stanford.EDU
1
A Story
2
11/13/20
2
Purvesh Khatri, Ph.D. A self-professed “data parasite”
3
Khatri has reused public data sets to
identify genomic signatures …
• For incipient sepsis
• For active tuberculosis
• For distinguishing viral from bacterial
respiratory infection
• For rejection of organ transplants
Now he’s working on diagnosis and prognosis of COVID-19
… and he has never touched a pipette!
4
11/13/20
3
But you can be a data parasite only if
scientific data are …
• Available in an accessible repository
• Findable through some sort of search
facility
• Self-describing so that third parties can
make sense of the data
• Intended to be reused, and to outlive the
experiment for which they were collected
5
6
11/13/20
4
The FAIR Principles
F1: (Meta) data are assigned globally
unique and persistent identifiers
F2: Data are described with rich
metadata
F3: Metadata clearly and explicitly
include the identifier of the data they
describe
F4: (Meta)data are registered or indexed
in a searchable resource
A1: (Meta)data are retrievable by their
identifier using a standardised
communication protocol
A1.1: The protocol is open, free and
universally implementable
A1.2: The protocol allows for an
authentication and authorisation where
necessary
A2: Metadata should be accessible even
when the data is no longer available
I1: (Meta)data use a formal, accessible,
shared, and broadly applicable language
for knowledge representation
I2: (Meta)data use vocabularies that follow
the FAIR principles
I3: (Meta)data include qualified references
to other (meta)data
R1: (Meta)data are richly described with a
plurality of accurate and relevant
attributes
R1.1: (Meta)data are released with a clear
and accessible data usage license
R1.2: (Meta)data are associated with
detailed provenance
R1.3: (Meta)data meet domain-relevant
community standards
7
Our current approaches to
FAIRness are failing
• Reviewing the FAIR principles as written
• Questionnaires and checklists
– Require detailed knowledge of the resource
– Can’t really call out dishonesty or wishful thinking
– Are tedious and time-consuming
• Automated FAIRness checkers
– Provide a report after it has become too late to do
anything about the problems
– Don’t identify which factors the user can actually
control
8
11/13/20
5
What investigators can control:
• It’s not the arcane stuff, such as
– Use of globally unique and persistent identifiers
– Metadata registration in a searchable index
– Data are retrievable by their identifier using a
standardized communication protocol
• It’s the concrete stuff they understand
– The metadata that describe the experiment
– The metadata that describe the experiment
– The metadata that describe the experiment
9
Metadata in most repositories are a mess!
• Investigators view their work as
publishing papers, not leaving a legacy of
reusable data
• Sponsors or management may require
data sharing, but they do not encourage
the use of research funds to pay for it
• Creating the metadata to describe data
sets is unbearably hard
10
11/13/20
6
11
12
11/13/20
7
13
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Failure to use standard terms makes
datasets often impossible to search
14
11/13/20
8
NCBI BioSample Metadata are Dreadful!
• 73% of “Boolean” metadata values are not
actually true or false
– nonsmoker, former-smoker
• 26% of “integer” metadata values cannot be
parsed into integers
– JM52, UVPgt59.4, pig
• 68% of metadata entries that are supposed to
represent terms from biomedical ontologies do
not actually do so.
– presumed normal, wild_type
15
16
11/13/20
9
We will never have FAIR data until …
• We have consensus regarding what is
good metadata
• We have standards for implementing
good metadata
– For the overall structure of the metadata
– For the values that we use
• We have tools that make it easy to create
good metadata
17
Requirement #1: Standard
ontologies to describe what
exists in a dataset completely
and consistently
18
11/13/20
10
19
20
11/13/20
11
http://bioportal.bioontology.org
21
22
11/13/20
12
23
BioPortal User Activity—Interactive
24
11/13/20
13
BioPortal User Activity—Programmatic
25
Requirement #2: Describe
properties of experiments
completely and consistently
26
11/13/20
14
The microarray community took the lead in
standardizing biological metadata
DNA Microarray
— What was the
substrate of the
experiment?
— What array
platform was
used?
— What were the
experimental
conditions?
27
28
11/13/20
15
But it didn’t stop with MIAME!
• Minimal Information About T Cell Assays
(MIATA)
• Minimal Information Required in the
Annotation of biochemical Models (MIRIAM)
• MINImal MEtagemome Sequence analysis
Standard (MINIMESS)
• Minimal Information Specification For In Situ
Hybridization and Immunohistochemistry
Experiments (MISFISHIE)
29
Minimal Information Guidelines
are not Models
• MIAME and its kin specify only the
“kinds of things” that investigators should
include in their metadata
• They do not provide a detailed list of standard
metadata elements
• They do not provide datatypes for valid
metadata entries
• It takes work to convert a prose checklist into
a computable model
30
11/13/20
16
To make sense of metadata …
• We need computer-based standards
– For what we want to say about experimental data
– For the language that we use to say those things
– For how we structure the metadata
• We need tools that make it easy to apply
these standards
• We need a scientific culture that values
making experimental data accessible to others
31
Requirement #3: Make it
palatable to describe
experiments completely and
consistently—which means we
need good tools!
32
11/13/20
17
http://metadatacenter.org
33
The CEDAR Approach to Metadata
34
11/13/20
18
35
36
11/13/20
19
37
The CEDAR Workbench
38
11/13/20
20
39
40
11/13/20
21
41
The CEDAR Workbench
42
11/13/20
22
43
44
11/13/20
23
brain
Parkinson’s disease (DOID) (39%)
central nervous system lymphoma (DOID) (27%)
autistic disorder (DOID) (22%)
melanoma (DOID) (5%)
schizophrenia (DOID) (1%)
Edwards syndrome (DOID) (2%)
45
Some key features of CEDAR
• All semantic components—template elements,
templates, ontologies, and value sets—are managed as
first-class entities
• User interfaces and drop-down menus are not
hardcoded, but are generated on the fly from CEDAR’s
semantic content
• All software components have well defined APIs,
facilitating reuse of software by a variety of clients
• CEDAR generates all metadata in JSON-LD, a widely
adopted Web standard that can be translated into
other representations
46
11/13/20
24
47
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
48
11/13/20
25
Who’s Using CEDAR?
49
Online Data Will Never Be FAIR
• Until we standardize metadata structure using
common templates
• Until we can fill in those templates with
controlled terms whenever possible
• Until we create technology that will make it
easy for investigators to annotate their
datasets in standardized, searchable ways
• Until the scientific enterprise adopts
this kind of technology
50
11/13/20
26
The “research
parasites” of the
world who reuse
data for a living will
be able to learn
from datasets that
are “cleaner,” better
organized, and
more searchable
Purvesh Khatri, Ph.D.
51
Scientists will learn
when new results
become available
that are relevant to
their work, and how
those results relate
to the data that
they are already
collecting
52
11/13/20
27
Clinicians will be
able to find the
evidence that best
helps them to select
treatments for their
patients
53
Once comprehensive, experimental
metadata are disseminated in
machine-interpretable form …
• Technology such as CEDAR will assist in the
automated “publication” of scientific results
online
• Intelligent agents will
– Search and “read” the “literature”
– Integrate information
– Track scientific advances
– Re-explore existing scientific datasets
– Suggest the next set of experiments to perform
– And maybe even do them!
54
11/13/20
28
http://metadatacenter.org
55

Mais conteúdo relacionado

Mais procurados

EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795
Michael Williams
 

Mais procurados (20)

Building a Global Search Engine for Genetic Data
Building a Global Search Engine for Genetic DataBuilding a Global Search Engine for Genetic Data
Building a Global Search Engine for Genetic Data
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Gnc march 2012
Gnc march 2012Gnc march 2012
Gnc march 2012
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795EPGP Informatics Publication - nihms-369795
EPGP Informatics Publication - nihms-369795
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicine
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
 

Semelhante a dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020

eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005
AsseroLtd
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 

Semelhante a dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020 (20)

Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005eSource, DIA EuroMeeting, Lisbon, March 2005
eSource, DIA EuroMeeting, Lisbon, March 2005
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
dkNET Introduction for Librarians
dkNET Introduction for LibrariansdkNET Introduction for Librarians
dkNET Introduction for Librarians
 
Data cycle health
Data cycle healthData cycle health
Data cycle health
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
FAIR play?
FAIR play? FAIR play?
FAIR play?
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
AMIA 2019: Unleashing the value of CDEs through CEDAR
AMIA 2019: Unleashing the value of CDEs through CEDARAMIA 2019: Unleashing the value of CDEs through CEDAR
AMIA 2019: Unleashing the value of CDEs through CEDAR
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIR
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
PEDSnet DQA CHOP Symposium
PEDSnet DQA CHOP SymposiumPEDSnet DQA CHOP Symposium
PEDSnet DQA CHOP Symposium
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 

Mais de dkNET

dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET
 
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
dkNET
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET
 
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
dkNET
 
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
dkNET
 
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET
 
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
dkNET
 
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
dkNET
 
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
dkNET
 
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET
 
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
dkNET
 
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
dkNET
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET
 

Mais de dkNET (20)

dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
dkNET Office Hours: NIH Data Management and Sharing Mandate  05/03/2024dkNET Office Hours: NIH Data Management and Sharing Mandate  05/03/2024
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
dkNET Webinar: Unlocking the Power of FAIR Data Sharing with ImmPort 04/12/2024
 
dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024dkNET Webinar: Tabula Sapiens 03/22/2024
dkNET Webinar: Tabula Sapiens 03/22/2024
 
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
dkNET Webinar "The Multi-Omic Response to Exercise Training Across Rat Tissue...
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
 
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
dkNET Webinar: A Single Cell Atlas of Human and Mouse White Adipose Tissue 11...
 
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
dkNET Webinar "The National Sleep Research Resource (NSRR) - Opportunities fo...
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
 
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
dkNET Webinar: Leveraging Computational Strategies to Identify Type 1 Diabete...
 
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
dkNET Webinar: Estimating Relative Beta-Cell Function During Continuous Gluco...
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
dkNET Webinar: Postpartum Glucose Screening Among Homeless Women with Gestati...
 
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
dkNET Webinar: Choosing Sample Sizes for Multilevel and Longitudinal Studies ...
 
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Se...
 
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023? New NIH Data Management and Sha...
 
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
dkNET Webinar "The Mission and Progress of the(sugar)science: Helping Scienti...
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
 

Último

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 

Último (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020

  • 1. 11/13/20 1 FAIR Data Require Better Metadata: The Case for CEDAR Mark A. Musen, M.D., Ph.D. and the CEDAR Team Stanford University musen@Stanford.EDU 1 A Story 2
  • 2. 11/13/20 2 Purvesh Khatri, Ph.D. A self-professed “data parasite” 3 Khatri has reused public data sets to identify genomic signatures … • For incipient sepsis • For active tuberculosis • For distinguishing viral from bacterial respiratory infection • For rejection of organ transplants Now he’s working on diagnosis and prognosis of COVID-19 … and he has never touched a pipette! 4
  • 3. 11/13/20 3 But you can be a data parasite only if scientific data are … • Available in an accessible repository • Findable through some sort of search facility • Self-describing so that third parties can make sense of the data • Intended to be reused, and to outlive the experiment for which they were collected 5 6
  • 4. 11/13/20 4 The FAIR Principles F1: (Meta) data are assigned globally unique and persistent identifiers F2: Data are described with rich metadata F3: Metadata clearly and explicitly include the identifier of the data they describe F4: (Meta)data are registered or indexed in a searchable resource A1: (Meta)data are retrievable by their identifier using a standardised communication protocol A1.1: The protocol is open, free and universally implementable A1.2: The protocol allows for an authentication and authorisation where necessary A2: Metadata should be accessible even when the data is no longer available I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation I2: (Meta)data use vocabularies that follow the FAIR principles I3: (Meta)data include qualified references to other (meta)data R1: (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1: (Meta)data are released with a clear and accessible data usage license R1.2: (Meta)data are associated with detailed provenance R1.3: (Meta)data meet domain-relevant community standards 7 Our current approaches to FAIRness are failing • Reviewing the FAIR principles as written • Questionnaires and checklists – Require detailed knowledge of the resource – Can’t really call out dishonesty or wishful thinking – Are tedious and time-consuming • Automated FAIRness checkers – Provide a report after it has become too late to do anything about the problems – Don’t identify which factors the user can actually control 8
  • 5. 11/13/20 5 What investigators can control: • It’s not the arcane stuff, such as – Use of globally unique and persistent identifiers – Metadata registration in a searchable index – Data are retrievable by their identifier using a standardized communication protocol • It’s the concrete stuff they understand – The metadata that describe the experiment – The metadata that describe the experiment – The metadata that describe the experiment 9 Metadata in most repositories are a mess! • Investigators view their work as publishing papers, not leaving a legacy of reusable data • Sponsors or management may require data sharing, but they do not encourage the use of research funds to pay for it • Creating the metadata to describe data sets is unbearably hard 10
  • 7. 11/13/20 7 13 age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Failure to use standard terms makes datasets often impossible to search 14
  • 8. 11/13/20 8 NCBI BioSample Metadata are Dreadful! • 73% of “Boolean” metadata values are not actually true or false – nonsmoker, former-smoker • 26% of “integer” metadata values cannot be parsed into integers – JM52, UVPgt59.4, pig • 68% of metadata entries that are supposed to represent terms from biomedical ontologies do not actually do so. – presumed normal, wild_type 15 16
  • 9. 11/13/20 9 We will never have FAIR data until … • We have consensus regarding what is good metadata • We have standards for implementing good metadata – For the overall structure of the metadata – For the values that we use • We have tools that make it easy to create good metadata 17 Requirement #1: Standard ontologies to describe what exists in a dataset completely and consistently 18
  • 13. 11/13/20 13 BioPortal User Activity—Programmatic 25 Requirement #2: Describe properties of experiments completely and consistently 26
  • 14. 11/13/20 14 The microarray community took the lead in standardizing biological metadata DNA Microarray — What was the substrate of the experiment? — What array platform was used? — What were the experimental conditions? 27 28
  • 15. 11/13/20 15 But it didn’t stop with MIAME! • Minimal Information About T Cell Assays (MIATA) • Minimal Information Required in the Annotation of biochemical Models (MIRIAM) • MINImal MEtagemome Sequence analysis Standard (MINIMESS) • Minimal Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE) 29 Minimal Information Guidelines are not Models • MIAME and its kin specify only the “kinds of things” that investigators should include in their metadata • They do not provide a detailed list of standard metadata elements • They do not provide datatypes for valid metadata entries • It takes work to convert a prose checklist into a computable model 30
  • 16. 11/13/20 16 To make sense of metadata … • We need computer-based standards – For what we want to say about experimental data – For the language that we use to say those things – For how we structure the metadata • We need tools that make it easy to apply these standards • We need a scientific culture that values making experimental data accessible to others 31 Requirement #3: Make it palatable to describe experiments completely and consistently—which means we need good tools! 32
  • 23. 11/13/20 23 brain Parkinson’s disease (DOID) (39%) central nervous system lymphoma (DOID) (27%) autistic disorder (DOID) (22%) melanoma (DOID) (5%) schizophrenia (DOID) (1%) Edwards syndrome (DOID) (2%) 45 Some key features of CEDAR • All semantic components—template elements, templates, ontologies, and value sets—are managed as first-class entities • User interfaces and drop-down menus are not hardcoded, but are generated on the fly from CEDAR’s semantic content • All software components have well defined APIs, facilitating reuse of software by a variety of clients • CEDAR generates all metadata in JSON-LD, a widely adopted Web standard that can be translated into other representations 46
  • 24. 11/13/20 24 47 age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years 48
  • 25. 11/13/20 25 Who’s Using CEDAR? 49 Online Data Will Never Be FAIR • Until we standardize metadata structure using common templates • Until we can fill in those templates with controlled terms whenever possible • Until we create technology that will make it easy for investigators to annotate their datasets in standardized, searchable ways • Until the scientific enterprise adopts this kind of technology 50
  • 26. 11/13/20 26 The “research parasites” of the world who reuse data for a living will be able to learn from datasets that are “cleaner,” better organized, and more searchable Purvesh Khatri, Ph.D. 51 Scientists will learn when new results become available that are relevant to their work, and how those results relate to the data that they are already collecting 52
  • 27. 11/13/20 27 Clinicians will be able to find the evidence that best helps them to select treatments for their patients 53 Once comprehensive, experimental metadata are disseminated in machine-interpretable form … • Technology such as CEDAR will assist in the automated “publication” of scientific results online • Intelligent agents will – Search and “read” the “literature” – Integrate information – Track scientific advances – Re-explore existing scientific datasets – Suggest the next set of experiments to perform – And maybe even do them! 54