NOTE THAT I HAVE MOVED AWAY FROM SLIDESHARE TO ZENODO
The identical presentation is now here:
https://doi.org/10.5281/zenodo.7778641
General introduction to LinkML, The Linked Data Modeling Language.
Adapter from presentation given to NIH May 2022
https://linkml.io/linkml
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
1. The Linked Data Modeling Language:
A framework for describing and integrating
rich biomedical data
Chris Mungall
Lawrence Berkeley National Laboratory
@chrismungall
June 2022
6. Open Biological Ontologies (OBO)
http://obofoundry.org
1. Well-integrated
Modular ontologies
(SUBSET of bioportal)
E.g GO, CHEBI, …
2. Provide technical and
sociotechnological
framework for
cooperation
4. Allow us to describe all
of the things
3. Provide tools,
best practices and
infrastructure for
forging new
ontologies
@obofoundry
8. Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Bada et al 2017 Gold-standard ontology-
based anatomical annotation in the
CRAFT Corpus
9. Ontologies: Example uses
Discovery and
machine reasoning
Text Mining
Data
Standardization
Maladi et al 2015 Ontology application
and use at the ENCODE DCC
14. Many standards are not
Machine Actionable
Many standards are specified in PDF
or Excel
● Not machine-actionable
● No validators
● Unclear semantics
Lack of automatic validation or data
submission assistance leads to noise
Results
in
(actual data from INSDC)
19. Semantic Web building blocks
URIs for identity
http://purl.uniprot.org/P12345
http://schema.org/name
Properties
Triples
For connecting nodes into
graphs
Classes
RDFS:
Schemas
OWL:
Ontology
Rule
Languages
Shape
Languages
24. LinkML: The basics
THE STANDARD
A meta-datamodel for structuring your data
TOOLS
Pragmatic developer and curator
friendly tools for working with data
definition
Class Slot
element
has
0..*
is_a 0..1
mixin 0..n
range
0..1
schema
imports
0..*
Validators
Data Converters
Compatibility tools
Data entry
Schema inference
26. Use Case: Making FAIR standards
As a….
I want to…
So that…
DCC wrangler
Design a data
submission standard
Experimentalists can easily
submit to the DCC
And…
The DCC can integrate it in the
context of other DCC data
It is maximally “FAIR” for
community reuse
And…
X
27. First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option A: Author
YAML directly
28. First Step: Create your datamode
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option B: Author using
schemasheets
29. First Step: Create your datamodel
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
YAML conformant
to LinkML
standard
Metadata
Dependencies
Namespaces
Actual Datamodel
Option C: Get intelligent
assistance from
autoschema tools
Autoschema /
model enrichment
framework
Semi-structured
datasources
refine
30. Tooling for submitters
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Option A: Generate
spreadsheet templates
empty sheet
Validator
populatedsheet
36. Incorporating ontologies into standards
Standardizing descriptors
aka. column headers, data dictionary,
metadata elements, CDEs
● Tissue sampling site
● Person name
● Symptoms
● Vital status
● Heart rate
● age
● Datafile sha256
● Sources
● Assay
● …
Standardizing value sets
I.e. column headers, data dictionary,
metadata elements, CDEs
● Organ slim (uberon)
● Phenotypic abnormality (HPO)
● Vital status (PATO)
● Assay Type (OBI)
● …
37. Annotating
schemas with
vocabularies
id: https://example.org/linkml/hello-world
title: Really basic LinkML model
name: hello-world
license: https://creativecommons.org/publicdomain/zero/1.0/
version: 0.0.1
prefixes:
linkml: https://w3id.org/linkml/
sdo: https://schema.org/
ex: https://example.org/linkml/hello-world/
default_prefix: ex
default_curi_maps:
- semweb_context
imports:
- linkml:types
classes:
Person:
description: Minimal information about a person
class_uri: sdo:Person
attributes:
id:
identifier: true
slot_uri: sdo:taxID
first_name:
required: true
slot_uri: sdo:givenName
multivalued: true
last_name:
required: true
slot_uri: sdo:familyName
knows:
range: Person
multivalued: true
slot_uri: foaf:knows
Export data to RDF and JSON-LD
Make the meaning of your schema
more explicit
Data integration hooks
39. Slots:
...
gender:
description: Person gender
slot_uri: SDO:gender
range: gender_enum
classes:
Thing:
description: The most generic type of item.
class_uri: SDO:Thing
slots:
- identifier
- url
- name
Person:
is_a: Thing
class_uri: SDO:Person
description: A person (alive, dead, undead, or
fictional).
slots:
- givenName
- additionalName
- gender
39
LinkML incorporates ISO/IEC 11179-3 meaning/data model
ISO/IEC 11179-
3:2013(E)
40. ISO/IEC 11179-3:2013(E) p. 101
A value that can appear in the data
What a particular value means
40
ISO/IEC 11179-3 divides enums into representation / meaning
44. Other schema features
Rich type system
Inheritance
/polymorphism
Complex boolean and
conditional constraints
Developer support:
Bindings for python,
typescript
45. Use in cancer data harmonization
Clinical
Terminologies
OBO Ontologies
(Uberon, CL, GO, …)
https://cancerdhc.github.io/ccdhmodel
Cancer Research Data
Commons (CRDC)
Harmonized Data Model
● Modeling team
● Terminology team
● Unified framework
Core concepts:
Specimen
Subject
Observation
48. Biological Knowledge Graphs
Biolink: Goals
The charge from NCATS:
● Create a Knowledge Graph Schema
● Encompass all biology from molecules through to clinical entities
● Get 20 different sites using the same data model
○ (oh: Only a handful of which use RDF/OWL)
● Do it quickly and break new ground in Translational Science
49. Biolink-Model: A schema for biological KGs
● Expressed in LinkML
● Integrates multiple Knowledge Graphs and
Knowledge Providers
Biolink Model
https://biolink.github.io/biolink-model
51. Future Plans
Hardening and adoption
● Governance around metamodel standard
● Documentation and tutorials
● Coordinate with major data providers and communities
● Completion of roundtrip conversion to multiple frameworks
● Highly efficient data readers/writers
Tool ecosystem
● Web based tooling
● Integrate automated assistant features
● Change management
● Rule systems
Currently driven by
community contributions
52. LinkML Summary
Challenges
● Authoring standards and data models is hard
● Adding semantics is harder
● Developing tools (UI, validators) is expensive
LinkML
● Designed to be easy to use
● Layer in semantics as you need them
● Leverage multiple tool stacks
● Increasing adoption
53. Acknowledgements
Person GitHub Institution
Harold Solbrig @hsolbrig JHU
Sujay Patil @sujaypatil96 LBNL
Sierra Moxon @sierra-moxon LBNL
Gaurav Vaidya @gaurav RENCI
Bill Duncan @wdduncan LBNL, UFL
Kevin Schaper @kevinschaper CU Anschutz
Joe Flack @joeflack4 JHU
Deepak Unni @deepakunni3 EMBL
Vincent Emonet @vemonet U Maastricht
Mark Miller @turbomam LBNL
Harshad Hegde @hrshdhgd LBNL
Person GitHub Institution
Dazhi Jiao @jiaola JHU
Matt Brush @mbrush CU Anschutz
Brian Furner @bfurner U Chicago
Tim Putman @putmantime CU Anschutz
Nico Matentzoglu @matentzn Semanticly
Ramona Walls @ramonawalls Critical Path Institute
Victoria Soesanto @victoriasoesanto CU Anschutz
Melissa Haendel @mellybelly CU Anschutz
U01HG009453
Intelligent Concept
Assistant
HG010860-01
Phenomics First CEGS