SlideShare uma empresa Scribd logo
1 de 53
The Linked Data Modeling Language:
A framework for describing and integrating
rich biomedical data
Chris Mungall
Lawrence Berkeley National Laboratory
@chrismungall
June 2022
Outline
Structuring
our data: we
can do better
Ontologies
and
vocabularies:
necessary
but not
sufficient
The LinkML
framework
Applications
Proliferation of entities, standards, and ontologies
>1800 Databases
>1500 Standards
>900 Ontologies
~13.5m terms
220m proteins
65bn genes
227m substances
>?? Data
Commonses
Data Integration is a constant challenge
Omics Data Phenotype /
clinical
Data
insights
Common vocabularies are key
Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
E.g GO, CHEBI, …
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to describe all
of the things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Bada et al 2017 Gold-standard ontology-
based anatomical annotation in the
CRAFT Corpus
Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Maladi et al 2015 Ontology application
and use at the ENCODE DCC
Example:
Uberon
Mungall et al. (2012). Genome Biology, 13(1),
R5. doi:10.1186/gb-2012-13-1-r5
http://obofoundry.org/ontology/uberon
Uberon usage in standards
https://fairsharing.org/graph/1197
Note: this is missing links to
hubmap, LINCS, MIxS, ENCODE,
….
Uberon usage in standards
https://fairshake.cloud/metric/140
Common Fund
Data Ecosystem
(CFDE) FAIR rubric
Uberon (mis) usage in standards
https://fairshake.cloud/metric/140
Many standards are not
Machine Actionable
Many standards are specified in PDF
or Excel
● Not machine-actionable
● No validators
● Unclear semantics
Lack of automatic validation or data
submission assistance leads to noise
Results
in
(actual data from INSDC)
Challenge: ontologies still underused
Challenge: Terms are not enough
Incompatible
Schemas !
The common situation
Semantic Web building blocks
URIs for identity
http://purl.uniprot.org/P12345
http://schema.org/name
Properties
Triples
For connecting nodes into
graphs
Classes
RDFS:
Schemas
OWL:
Ontology
Rule
Languages
Shape
Languages
ISO-11179: Metadata Standards
ISO-11179: Metadata Standards
Semantic tooling has still not permeated
RDF
OWL
SPARQL
SHACL
ShEx
Rules
Semantic web
developer
Developer
Data Scientist
Scientists, Clinicians, ..
Python
SQL
Mongo
JSON
Pandas
BigTable
SPARK
Scikit-learn
Excel
Web Portals
???
ISO-11179
CDEs
Can we have a universal framework?
LinkML: The basics
THE STANDARD
A meta-datamodel for structuring your data
TOOLS
Pragmatic developer and curator
friendly tools for working with data
definition
Class Slot
element
has
0..*
is_a 0..1
mixin 0..n
range
0..1
schema
imports
0..*
Validators
Data Converters
Compatibility tools
Data entry
Schema inference
LinkML Landscape
JSON-Schema
ShEx, SHACL
JSON-LD
Contexts
Python
Dataclasses
OWL
https://linkml.io
https://github.com/linkml/linkml
Semantic Web
Applications
And
Infrastructure
“Traditional”
Applications and
Infrastructure
SQL DDL
TSVs
Create datamodels in simple YAML files,
optionally annotated using ontologies
Compile to other
frameworks
Choose the right tools
for the job, no lock in
Biocurator
Data
Scientist
dct:creator
Use Case: Making FAIR standards
As a….
I want to…
So that…
DCC wrangler
Design a data
submission standard
Experimentalists can easily
submit to the DCC
And…
The DCC can integrate it in the
context of other DCC data
It is maximally “FAIR” for
community reuse
And…
X
First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option A: Author
YAML directly
First Step: Create your datamode
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option B: Author using
schemasheets
First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option C: Get intelligent
assistance from
autoschema tools
Autoschema /
model enrichment
framework
Semi-structured
datasources
refine
Tooling for submitters
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Option A: Generate
spreadsheet templates
empty sheet
Validator
populatedsheet
Tooling for submitters
Option B: Use
DataHarmonizer
(Hsaio Lab)
https://github.com/cidgoh/DataHarmonizer
Tooling for submitters
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Option A: Generate
JSON-Schema
JSON-Schema
Validator
(JSON-Schema)
populatedJSON
searchable documentation for your
standard
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
https://cancerdhc.github.io/ccdhmodel/v1.1
Incorporating ontologies into standards
Standardizing descriptors
aka. column headers, data dictionary,
metadata elements, CDEs
● Tissue sampling site
● Person name
● Symptoms
● Vital status
● Heart rate
● age
● Datafile sha256
● Sources
● Assay
● …
Standardizing value sets
I.e. column headers, data dictionary,
metadata elements, CDEs
● Organ slim (uberon)
● Phenotypic abnormality (HPO)
● Vital status (PATO)
● Assay Type (OBI)
● …
Annotating
schemas with
vocabularies
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Export data to RDF and JSON-LD
Make the meaning of your schema
more explicit
Data integration hooks
Easy ontology support via value sets
Slots:
...
gender:
description: Person gender
slot_uri: SDO:gender
range: gender_enum
classes:
Thing:
description: The most generic type of item.
class_uri: SDO:Thing
slots:
- identifier
- url
- name
Person:
is_a: Thing
class_uri: SDO:Person
description: A person (alive, dead, undead, or
fictional).
slots:
- givenName
- additionalName
- gender
39
LinkML incorporates ISO/IEC 11179-3 meaning/data model
ISO/IEC 11179-
3:2013(E)
ISO/IEC 11179-3:2013(E) p. 101
A value that can appear in the data
What a particular value means
40
ISO/IEC 11179-3 divides enums into representation / meaning
enums:
gender_enum:
description: |-
Gender of something, ...
permissible_values:
0: Male Gender
1: Female Gender
8: Mixed Gender
Enumeration flavors
41
LinkML supports simple enums
Enumeration flavors
gender_enum_2:
code_set: sdo:GenderType
permissible_values:
0:
description: Male Gender
meaning: sdo:Male
1:
description: Female Gender
meaning: sdo:Female
8:
description: Mixed Gender
42
LinkML supports meaning link
gender_enum_3:
code_set: sdo:GenderType
pv_formula: CODE
43
LinkML supports meanings
drawn from conceptual domain
Other schema features
Rich type system
Inheritance
/polymorphism
Complex boolean and
conditional constraints
Developer support:
Bindings for python,
typescript
Use in cancer data harmonization
Clinical
Terminologies
OBO Ontologies
(Uberon, CL, GO, …)
https://cancerdhc.github.io/ccdhmodel
Cancer Research Data
Commons (CRDC)
Harmonized Data Model
● Modeling team
● Terminology team
● Unified framework
Core concepts:
Specimen
Subject
Observation
Environmental microbiome data
https://microbiomedata.github.io/nmdc-schema/
Metadata standards to enable
microbiome analysis
● Environmental sample data
● Omics data
● Community development model
Core concepts:
Study
Environmental Sample
Workflow Analysis
(genomic, metabolomic, ..)
Data Object
Environmental microbiome data
Biological Knowledge Graphs
Biolink: Goals
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
Biolink-Model: A schema for biological KGs
● Expressed in LinkML
● Integrates multiple Knowledge Graphs and
Knowledge Providers
Biolink Model
https://biolink.github.io/biolink-model
Other adopters
Future Plans
Hardening and adoption
● Governance around metamodel standard
● Documentation and tutorials
● Coordinate with major data providers and communities
● Completion of roundtrip conversion to multiple frameworks
● Highly efficient data readers/writers
Tool ecosystem
● Web based tooling
● Integrate automated assistant features
● Change management
● Rule systems
Currently driven by
community contributions
LinkML Summary
Challenges
● Authoring standards and data models is hard
● Adding semantics is harder
● Developing tools (UI, validators) is expensive
LinkML
● Designed to be easy to use
● Layer in semantics as you need them
● Leverage multiple tool stacks
● Increasing adoption
Acknowledgements
Person GitHub Institution
Harold Solbrig @hsolbrig JHU
Sujay Patil @sujaypatil96 LBNL
Sierra Moxon @sierra-moxon LBNL
Gaurav Vaidya @gaurav RENCI
Bill Duncan @wdduncan LBNL, UFL
Kevin Schaper @kevinschaper CU Anschutz
Joe Flack @joeflack4 JHU
Deepak Unni @deepakunni3 EMBL
Vincent Emonet @vemonet U Maastricht
Mark Miller @turbomam LBNL
Harshad Hegde @hrshdhgd LBNL
Person GitHub Institution
Dazhi Jiao @jiaola JHU
Matt Brush @mbrush CU Anschutz
Brian Furner @bfurner U Chicago
Tim Putman @putmantime CU Anschutz
Nico Matentzoglu @matentzn Semanticly
Ramona Walls @ramonawalls Critical Path Institute
Victoria Soesanto @victoriasoesanto CU Anschutz
Melissa Haendel @mellybelly CU Anschutz
U01HG009453
Intelligent Concept
Assistant
HG010860-01
Phenomics First CEGS

Mais conteúdo relacionado

Mais procurados

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph SchemaJoshua Shinavier
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Layout lm paper review
Layout lm paper review Layout lm paper review
Layout lm paper review taeseon ryu
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial_Emw
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Simplilearn
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 

Mais procurados (20)

SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Layout lm paper review
Layout lm paper review Layout lm paper review
Layout lm paper review
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
SHACL Overview
SHACL OverviewSHACL Overview
SHACL Overview
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 

Semelhante a LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO

LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupChris Mungall
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebMathieu d'Aquin
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Toni Hermoso Pulido
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAFUldis Bojars
 
RDFa Semantic Web
RDFa Semantic WebRDFa Semantic Web
RDFa Semantic WebRob Paok
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Approaches to machine actionable links
Approaches to machine actionable linksApproaches to machine actionable links
Approaches to machine actionable linksStephen Richard
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)LinkedTV
 

Semelhante a LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO (20)

LinkML presentation to Yosemite Group
LinkML presentation to Yosemite GroupLinkML presentation to Yosemite Group
LinkML presentation to Yosemite Group
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Linked Data
Linked DataLinked Data
Linked Data
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
Jones "Working with Scholarly APIs: A NISO Training Series, Session One: Foun...
Jones "Working with Scholarly APIs: A NISO Training Series, Session One: Foun...Jones "Working with Scholarly APIs: A NISO Training Series, Session One: Foun...
Jones "Working with Scholarly APIs: A NISO Training Series, Session One: Foun...
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
 
RDFa Semantic Web
RDFa Semantic WebRDFa Semantic Web
RDFa Semantic Web
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Approaches to machine actionable links
Approaches to machine actionable linksApproaches to machine actionable links
Approaches to machine actionable links
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Walter api
Walter apiWalter api
Walter api
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
 

Mais de Chris Mungall

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesChris Mungall
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)Chris Mungall
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeChris Mungall
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)Chris Mungall
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...Chris Mungall
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributionsChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyChris Mungall
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyChris Mungall
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 

Mais de Chris Mungall (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Scaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciencesScaling up semantics; lessons learned across the life sciences
Scaling up semantics; lessons learned across the life sciences
 
LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)LinkML Intro (for Monarch devs)
LinkML Intro (for Monarch devs)
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
All together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of lifeAll together now: piecing together the knowledge graph of life
All together now: piecing together the knowledge graph of life
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)SparqlProg (BioHackathon 2019)
SparqlProg (BioHackathon 2019)
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
 
Uberon: opening up to community contributions
Uberon: opening up to community contributionsUberon: opening up to community contributions
Uberon: opening up to community contributions
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Causal reasoning using the Relation Ontology
Causal reasoning using the Relation OntologyCausal reasoning using the Relation Ontology
Causal reasoning using the Relation Ontology
 
US2TS presentation on Gene Ontology
US2TS presentation on Gene OntologyUS2TS presentation on Gene Ontology
US2TS presentation on Gene Ontology
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
ENVO GSC 2015
ENVO GSC 2015ENVO GSC 2015
ENVO GSC 2015
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Kboom phenoday-2016
Kboom phenoday-2016Kboom phenoday-2016
Kboom phenoday-2016
 
BioMake PAG 2017
BioMake PAG 2017 BioMake PAG 2017
BioMake PAG 2017
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO

  • 1. The Linked Data Modeling Language: A framework for describing and integrating rich biomedical data Chris Mungall Lawrence Berkeley National Laboratory @chrismungall June 2022
  • 2. Outline Structuring our data: we can do better Ontologies and vocabularies: necessary but not sufficient The LinkML framework Applications
  • 3. Proliferation of entities, standards, and ontologies >1800 Databases >1500 Standards >900 Ontologies ~13.5m terms 220m proteins 65bn genes 227m substances >?? Data Commonses
  • 4. Data Integration is a constant challenge Omics Data Phenotype / clinical Data insights
  • 6. Open Biological Ontologies (OBO) http://obofoundry.org 1. Well-integrated Modular ontologies (SUBSET of bioportal) E.g GO, CHEBI, … 2. Provide technical and sociotechnological framework for cooperation 4. Allow us to describe all of the things 3. Provide tools, best practices and infrastructure for forging new ontologies @obofoundry
  • 7. Ontologies: Example uses Discovery and machine reasoning Text Mining Data Standardization
  • 8. Ontologies: Example uses Discovery and machine reasoning Text Mining Data Standardization Bada et al 2017 Gold-standard ontology- based anatomical annotation in the CRAFT Corpus
  • 9. Ontologies: Example uses Discovery and machine reasoning Text Mining Data Standardization Maladi et al 2015 Ontology application and use at the ENCODE DCC
  • 10. Example: Uberon Mungall et al. (2012). Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://obofoundry.org/ontology/uberon
  • 11. Uberon usage in standards https://fairsharing.org/graph/1197 Note: this is missing links to hubmap, LINCS, MIxS, ENCODE, ….
  • 12. Uberon usage in standards https://fairshake.cloud/metric/140 Common Fund Data Ecosystem (CFDE) FAIR rubric
  • 13. Uberon (mis) usage in standards https://fairshake.cloud/metric/140
  • 14. Many standards are not Machine Actionable Many standards are specified in PDF or Excel ● Not machine-actionable ● No validators ● Unclear semantics Lack of automatic validation or data submission assistance leads to noise Results in (actual data from INSDC)
  • 16. Challenge: Terms are not enough Incompatible Schemas !
  • 18.
  • 19. Semantic Web building blocks URIs for identity http://purl.uniprot.org/P12345 http://schema.org/name Properties Triples For connecting nodes into graphs Classes RDFS: Schemas OWL: Ontology Rule Languages Shape Languages
  • 22. Semantic tooling has still not permeated RDF OWL SPARQL SHACL ShEx Rules Semantic web developer Developer Data Scientist Scientists, Clinicians, .. Python SQL Mongo JSON Pandas BigTable SPARK Scikit-learn Excel Web Portals ??? ISO-11179 CDEs
  • 23. Can we have a universal framework?
  • 24. LinkML: The basics THE STANDARD A meta-datamodel for structuring your data TOOLS Pragmatic developer and curator friendly tools for working with data definition Class Slot element has 0..* is_a 0..1 mixin 0..n range 0..1 schema imports 0..* Validators Data Converters Compatibility tools Data entry Schema inference
  • 25. LinkML Landscape JSON-Schema ShEx, SHACL JSON-LD Contexts Python Dataclasses OWL https://linkml.io https://github.com/linkml/linkml Semantic Web Applications And Infrastructure “Traditional” Applications and Infrastructure SQL DDL TSVs Create datamodels in simple YAML files, optionally annotated using ontologies Compile to other frameworks Choose the right tools for the job, no lock in Biocurator Data Scientist dct:creator
  • 26. Use Case: Making FAIR standards As a…. I want to… So that… DCC wrangler Design a data submission standard Experimentalists can easily submit to the DCC And… The DCC can integrate it in the context of other DCC data It is maximally “FAIR” for community reuse And… X
  • 27. First Step: Create your datamodel id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option A: Author YAML directly
  • 28. First Step: Create your datamode id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option B: Author using schemasheets
  • 29. First Step: Create your datamodel id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows YAML conformant to LinkML standard Metadata Dependencies Namespaces Actual Datamodel Option C: Get intelligent assistance from autoschema tools Autoschema / model enrichment framework Semi-structured datasources refine
  • 30. Tooling for submitters id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Option A: Generate spreadsheet templates empty sheet Validator populatedsheet
  • 31. Tooling for submitters Option B: Use DataHarmonizer (Hsaio Lab) https://github.com/cidgoh/DataHarmonizer
  • 32. Tooling for submitters id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Option A: Generate JSON-Schema JSON-Schema Validator (JSON-Schema) populatedJSON
  • 33. searchable documentation for your standard id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows
  • 35.
  • 36. Incorporating ontologies into standards Standardizing descriptors aka. column headers, data dictionary, metadata elements, CDEs ● Tissue sampling site ● Person name ● Symptoms ● Vital status ● Heart rate ● age ● Datafile sha256 ● Sources ● Assay ● … Standardizing value sets I.e. column headers, data dictionary, metadata elements, CDEs ● Organ slim (uberon) ● Phenotypic abnormality (HPO) ● Vital status (PATO) ● Assay Type (OBI) ● …
  • 37. Annotating schemas with vocabularies id: https://example.org/linkml/hello-world title: Really basic LinkML model name: hello-world license: https://creativecommons.org/publicdomain/zero/1.0/ version: 0.0.1 prefixes: linkml: https://w3id.org/linkml/ sdo: https://schema.org/ ex: https://example.org/linkml/hello-world/ default_prefix: ex default_curi_maps: - semweb_context imports: - linkml:types classes: Person: description: Minimal information about a person class_uri: sdo:Person attributes: id: identifier: true slot_uri: sdo:taxID first_name: required: true slot_uri: sdo:givenName multivalued: true last_name: required: true slot_uri: sdo:familyName knows: range: Person multivalued: true slot_uri: foaf:knows Export data to RDF and JSON-LD Make the meaning of your schema more explicit Data integration hooks
  • 38. Easy ontology support via value sets
  • 39. Slots: ... gender: description: Person gender slot_uri: SDO:gender range: gender_enum classes: Thing: description: The most generic type of item. class_uri: SDO:Thing slots: - identifier - url - name Person: is_a: Thing class_uri: SDO:Person description: A person (alive, dead, undead, or fictional). slots: - givenName - additionalName - gender 39 LinkML incorporates ISO/IEC 11179-3 meaning/data model ISO/IEC 11179- 3:2013(E)
  • 40. ISO/IEC 11179-3:2013(E) p. 101 A value that can appear in the data What a particular value means 40 ISO/IEC 11179-3 divides enums into representation / meaning
  • 41. enums: gender_enum: description: |- Gender of something, ... permissible_values: 0: Male Gender 1: Female Gender 8: Mixed Gender Enumeration flavors 41 LinkML supports simple enums
  • 42. Enumeration flavors gender_enum_2: code_set: sdo:GenderType permissible_values: 0: description: Male Gender meaning: sdo:Male 1: description: Female Gender meaning: sdo:Female 8: description: Mixed Gender 42 LinkML supports meaning link
  • 43. gender_enum_3: code_set: sdo:GenderType pv_formula: CODE 43 LinkML supports meanings drawn from conceptual domain
  • 44. Other schema features Rich type system Inheritance /polymorphism Complex boolean and conditional constraints Developer support: Bindings for python, typescript
  • 45. Use in cancer data harmonization Clinical Terminologies OBO Ontologies (Uberon, CL, GO, …) https://cancerdhc.github.io/ccdhmodel Cancer Research Data Commons (CRDC) Harmonized Data Model ● Modeling team ● Terminology team ● Unified framework Core concepts: Specimen Subject Observation
  • 46. Environmental microbiome data https://microbiomedata.github.io/nmdc-schema/ Metadata standards to enable microbiome analysis ● Environmental sample data ● Omics data ● Community development model Core concepts: Study Environmental Sample Workflow Analysis (genomic, metabolomic, ..) Data Object
  • 48. Biological Knowledge Graphs Biolink: Goals The charge from NCATS: ● Create a Knowledge Graph Schema ● Encompass all biology from molecules through to clinical entities ● Get 20 different sites using the same data model ○ (oh: Only a handful of which use RDF/OWL) ● Do it quickly and break new ground in Translational Science
  • 49. Biolink-Model: A schema for biological KGs ● Expressed in LinkML ● Integrates multiple Knowledge Graphs and Knowledge Providers Biolink Model https://biolink.github.io/biolink-model
  • 51. Future Plans Hardening and adoption ● Governance around metamodel standard ● Documentation and tutorials ● Coordinate with major data providers and communities ● Completion of roundtrip conversion to multiple frameworks ● Highly efficient data readers/writers Tool ecosystem ● Web based tooling ● Integrate automated assistant features ● Change management ● Rule systems Currently driven by community contributions
  • 52. LinkML Summary Challenges ● Authoring standards and data models is hard ● Adding semantics is harder ● Developing tools (UI, validators) is expensive LinkML ● Designed to be easy to use ● Layer in semantics as you need them ● Leverage multiple tool stacks ● Increasing adoption
  • 53. Acknowledgements Person GitHub Institution Harold Solbrig @hsolbrig JHU Sujay Patil @sujaypatil96 LBNL Sierra Moxon @sierra-moxon LBNL Gaurav Vaidya @gaurav RENCI Bill Duncan @wdduncan LBNL, UFL Kevin Schaper @kevinschaper CU Anschutz Joe Flack @joeflack4 JHU Deepak Unni @deepakunni3 EMBL Vincent Emonet @vemonet U Maastricht Mark Miller @turbomam LBNL Harshad Hegde @hrshdhgd LBNL Person GitHub Institution Dazhi Jiao @jiaola JHU Matt Brush @mbrush CU Anschutz Brian Furner @bfurner U Chicago Tim Putman @putmantime CU Anschutz Nico Matentzoglu @matentzn Semanticly Ramona Walls @ramonawalls Critical Path Institute Victoria Soesanto @victoriasoesanto CU Anschutz Melissa Haendel @mellybelly CU Anschutz U01HG009453 Intelligent Concept Assistant HG010860-01 Phenomics First CEGS