SlideShare uma empresa Scribd logo
1 de 46
Improving discoverability for Life
Sciences resources
Alasdair J.G. Gray
Bioschemas Leadership Team Chair
Heriot-Watt University/Elixir-UK
Bioschemas
ELIXIR All Hands Tutorial
Lisbon, Portugal – 19 June 2019
Google Search
2http://bioschemas.org
Google Search
3http://bioschemas.org
Google Dataset Search (Sept 2018)
4
https://toolbox.google.com/datasetsearch
http://bioschemas.org
https://www.blog.google/products/search/making-it-easier-discover-datasets/
Picture: Carole Goble, Turing Lecture 2018
Schema.org: Semantic Markup for the Web
Structured data → descriptors
● Types
(614)
What we can
say about
those things
● Properties
(905)
What we are
talking about
Bioschemas
• Community initiative built on top of
schema.org
• Aim
• Improve data discoverability and
interoperability in Life Sciences
• Approach
• Add Life Science types to schema.org
• Provide usage guidelines and examples
• 6 Minimal properties
• Link to domain ontologies
• Support software
Profile over schema.org
Layer of constraints + documentation +
extensions Specification
Data model
Minimum information
Controlled vocabularies
Cardinality
Documentation
Examples
New (properties | types)
Findable Accessible Interoperable Reusable
★Globally unique
identifiers
★Community
defined enriched
metadata
★Indexable by
search engines
★JSON-LD/RDFa
★Link to
controlled
vocabularies
★Links to other
resources
★ License
★ Provenance
★Retrievable
★HTTP
Schema.org for Datasets
Schema definition:
●Dataset: A body of structured
information describing some
topic(s) of interest
http://schema.org/Dataset
●91 properties including:
○name
○description
○isFamilyFriendly
9
Google Dataset Profile
• 2 required properties
• Used for Google Dataset Search
• 10 recommended properties
• Link to DataCatalog
• Link to DataDownload
Other profiles: Events, Jobs,
...
https://developers.google.com/search/docs/data-types/dataset
Google Dataset Profile
Compliant with Google
Dataset Profile
• 5 minimal properties
• 8 recommended properties
• Link to DataCatalog
• Link to DataDownload
http://bioschemas.org/specifications/Dataset/
Bioschemas Dataset Profile
Extending Schema.org for the Life Sciences
7 release candidates
Submission in progress!
More types in development
14
Profile Version Group Live Deploys Status notes
DataCatalog 0.2 (Jun 2019) Data Repos 20 0.2 fixes minor issues
Dataset 0.3 (Jun 2019) Datasets 23 0.3 fixes minor issues
Event 0.1 (July 2018) Events 7 Used by TeSS: undergoing revision due to addition of CourseInstance
Sample 0.2 (Nov 2018) Samples 1
Taxon 0.3 (Nov 2018) Biodiversity 0
Tool 0.1 (Mar 2018) Tools 5 0.3-DRAFT based on bio.tools profile, needs review
TrainingMaterial 0.2 (July 2018) Training 0 Used by TeSS: 0.5-DRAFT incorporating changes from Course
Current Bioschemas Profiles
Draft Bioschemas Profiles
15
● Beacon: 0.2-DRAFT 2018-04-23
● BioSample: 0.1-DRAFT
● ChemicalSubstance: 0.2-DRAFT 2019-06-11
● Course: 0.6-DRAFT 2019-06-06
● CourseInstance: 0.6-DRAFT 2019-06-06
● DNA: 0.1-DRAFT 2018-11-13
● DataRecord: 0.2-DRAFT 2019-06-14
● Gene: 0.5-DRAFT 2019-06-14
● Journal: 0.1-DRAFT 2019-02-08
● LabProtocol: 0.3-DRAFT 2019-06-14
● MolecularEntity: 0.2-DRAFT 2019-11-15
● Organization: 0.1-DRAFT 2018-03-13
● Person: 0.1-DRAFT 2018-03-14
● Phenotype: 0.1-DRAFT 2018-11-15
● Protein: 0.8-DRAFT 2019-05-08
● ProteinAnnotation: 0.4-DRAFT 2018-02-25
● ProteinStructure: 0.5-DRAFT 2018-08-15
● PublicationIssue: 0.1-DRAFT 2019-02-08
● PublicationVolume: 0.1-DRAFT 2019-02-08
● ScholarlyArticle: 0.1-DRAFT 2019-02-08
● SemanticAnnotation: 0.1-DRAFT 2019-02-08
● Standard: 0.1-DRAFT 2018-01-01
● Study: 0.1-DRAFT 2018-11-15
● Tool: 0.3-DRAFT 2018-11-21
● TrainingMaterial: 0.6-DRAFT 2019-06-06
● Workflow: 0.1-DRAFT 2019-02-08
Mapping ProfileUse cases
Mockup
Adoption
Testing Application
Profile Creation Process
Bioschemas Software
29 November 2018 http://bioschemas.org 19
Bioschemas Generator
● Supports all profiles
○ Current and draft
● Validates input
● Form generated from
YAML description
● Examples extracted from
profile
Exploiting Bioschemas Markup
TeSS: Specialised Search
http://bioschemas.org
• contact
• description
• endDate
• eventType
• hostInstitution
• location
• name
• startDate
• …
Bioschemas Event:
29 November 2018 21
http://bioschemas.org
• description
• keywords
• name
• provider
• url
Bioschema DataCatalog:
• alternateName
• citation
• dateCreated
• licence
• …
Automated Data Curation
Data Exchange: Without an API
MarRef → BioSamples
https://github.com/EBIBioSamples/bioschemas_marref_demo/blob/master/Summary.md
BKG Explorer
Built over Bioschemas markup crawled from 30 live deployments
20,000 pages
Bioschemas
What?
• Exploiting schema.org to make Life Sciences
resources more discoverable
• Search engines will index and understand
markup
How?
• Extending schema.org vocabulary for life
sciences
• 7 release candidate types
• Provide guidelines on how to markup
resources
200+
People
7
Tutorial
s
(2018)
17Type
s
6Publications
(2018)30Profiles
62Sites
11M+
Pages
Bioschemas Community
http://bioschemas.org/liveDeploys
http://bioschemas.org/
liveDeploys
http://bioschemas.org
Acknowledgements http://bioschemas.org/people
http://bioschemas.org/ @bioschemas https://github.com/bioschemas/
Join Bioschemas: http://bioschemas.org/howtojoin/
Creating and Deploying
Bioschemas Markup
Material from: Justin Clark-Casey
License: Attribution 4.0 International (CC BY 4.0)
Kenneth McLeod
Creating Bioschemas markup
● Markup is in a format called JSON-LD
● Embedded directly into webpages
● Let’s look at an example of the DataCatalog schema as used by Bioschemas
○ This comes from schema.org but Bioschemas adds
■ Mandatory/recommended/optional properties
■ Cardinality constraints
Markup can be placed in either the
head or the body.
Let’s look at this in Google’s Structued
Data Testing Tool
@context is overwritten by Google
Technically any prefixes can be defined here, e.g.,
"@context":["https://schema.org", {"OBI":"http://purl.obolibrary.org/obo/OBI_" ...}],
"@type":["Sample","OBI:0000747"] …
BUT, Google will overwrite this with the basic "@context": "http://schema.org"
@id - gives a node a URL
Without @id there are auto-generated URLs for nodes, e.g.,
<script type="application/ld+json">{
"@context" : "https://schema.org",
"@type" : "DataCatalog", ...
becomes:
_:genid2d4335ed7c72694275bea5b6a86ad9f82b2db0
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<https://schema.org/DataCatalog> .
Bad for Linked Data as no one can reference this.
@id - gives a node a URL
With an@id you choose the URL for nodes, e.g.,
<script type="application/ld+json">{
"@context" : "https://schema.org",
"@type" : "DataCatalog",
"@id" : "https://www.ebi.ac.uk/biosamples" …
becomes:
<https://www.ebi.ac.uk/biosamples> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <https://schema.org/DataCatalog> .
Warning! Don’t use the same @id for everything
DataCatalog & Dataset defined separately, but combined into a single entity:
GSDTT
common errors
If you don’t meet
Google’s
desired property
specification for
a given type you see
errors like:
If Bioschemas spec says this is OK, you can
ignore error (FYI it is a real error)
Not min properties in
Bioschemas; do what you
want
This error is caused by the
incorrect target type of location.
Description is min property
for Bioschemas (ie
mandatory)
Bioschema’s Types not yet accepted by Schema:
Ignore these
Markup Generator
Example:
https://bio.tools/blast
https://blast.ncbi.nlm.nih.gov/Blast.cgi
https://bioschemas.org/devSpecs/Tool/
Evolving Best Practices
● At the moment we largely create markup by hand with validation through
Google’s testing tool
○ More validators and tools on the way, see bioschemas.org/tools
● Make pages with markup reachable from your sitemap.xml
○ This will make it easier for some applications to find it.
● Avoid adding Bioschemas markup to the page dynamically (e.g. through
Javascript)
○ Applications trying to find your data may not have the resources to render pages.
● Specify an @id
● Evolving guidance at
https://github.com/BioSchemas/specifications/wiki/Technical
Questions?
● bioschemas.org
● bioschemas.org/groups/Technical
● https://bioschemas.org/software/
● Google Structured Data Testing Tool
● kcm1@hw.ac.uk

Mais conteúdo relacionado

Semelhante a Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019

Preservation Metadata, CARLI Metadata Matters series, December 2010
Preservation Metadata, CARLI Metadata Matters series, December 2010Preservation Metadata, CARLI Metadata Matters series, December 2010
Preservation Metadata, CARLI Metadata Matters series, December 2010
Claire Stewart
 
College for Computer & Information Sciences 3333 Regis Boule.docx
College for Computer & Information Sciences  3333 Regis Boule.docxCollege for Computer & Information Sciences  3333 Regis Boule.docx
College for Computer & Information Sciences 3333 Regis Boule.docx
clarebernice
 

Semelhante a Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019 (20)

Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 
scholarresearchinformation-130225230116-phpapp02.ppt
scholarresearchinformation-130225230116-phpapp02.pptscholarresearchinformation-130225230116-phpapp02.ppt
scholarresearchinformation-130225230116-phpapp02.ppt
 
Preservation Metadata, CARLI Metadata Matters series, December 2010
Preservation Metadata, CARLI Metadata Matters series, December 2010Preservation Metadata, CARLI Metadata Matters series, December 2010
Preservation Metadata, CARLI Metadata Matters series, December 2010
 
Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019Advanced data-driven technical SEO - SMX London 2019
Advanced data-driven technical SEO - SMX London 2019
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
Linking Software: citations, roles, references and more
Linking Software: citations, roles, references and moreLinking Software: citations, roles, references and more
Linking Software: citations, roles, references and more
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
HandsonSystematicLiterature ReviewForHighImpactResearch.pdf
HandsonSystematicLiterature ReviewForHighImpactResearch.pdfHandsonSystematicLiterature ReviewForHighImpactResearch.pdf
HandsonSystematicLiterature ReviewForHighImpactResearch.pdf
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
College for Computer & Information Sciences 3333 Regis Boule.docx
College for Computer & Information Sciences  3333 Regis Boule.docxCollege for Computer & Information Sciences  3333 Regis Boule.docx
College for Computer & Information Sciences 3333 Regis Boule.docx
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
Module 6B - New GBIF Tools II 2013: Portal and NPT Startup
Module 6B - New GBIF Tools II 2013:  Portal and NPT StartupModule 6B - New GBIF Tools II 2013:  Portal and NPT Startup
Module 6B - New GBIF Tools II 2013: Portal and NPT Startup
 
[Workshop] Best-Practice Tech Sourcing, Susanna Frazier - Recruiters’ Hub New...
[Workshop] Best-Practice Tech Sourcing, Susanna Frazier - Recruiters’ Hub New...[Workshop] Best-Practice Tech Sourcing, Susanna Frazier - Recruiters’ Hub New...
[Workshop] Best-Practice Tech Sourcing, Susanna Frazier - Recruiters’ Hub New...
 
Bioschemas: Datasets and Data Catalogs
Bioschemas: Datasets and Data CatalogsBioschemas: Datasets and Data Catalogs
Bioschemas: Datasets and Data Catalogs
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich Snippets
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
CGSpace and PRMS Information Session
CGSpace and PRMS Information SessionCGSpace and PRMS Information Session
CGSpace and PRMS Information Session
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
 

Mais de Bioschemas

Bioschemas community: Developing profiles over Schema.org to make life scienc...
Bioschemas community: Developing profiles over Schema.org to make life scienc...Bioschemas community: Developing profiles over Schema.org to make life scienc...
Bioschemas community: Developing profiles over Schema.org to make life scienc...
Bioschemas
 

Mais de Bioschemas (6)

Bioschemas findability and interoperability
Bioschemas findability and interoperabilityBioschemas findability and interoperability
Bioschemas findability and interoperability
 
Bioschemas community: Developing profiles over Schema.org to make life scienc...
Bioschemas community: Developing profiles over Schema.org to make life scienc...Bioschemas community: Developing profiles over Schema.org to make life scienc...
Bioschemas community: Developing profiles over Schema.org to make life scienc...
 
Bioschemas overview
Bioschemas overviewBioschemas overview
Bioschemas overview
 
Bioschemas: Using Schema.org for describing scientific information
Bioschemas: Using Schema.org for describing scientific information Bioschemas: Using Schema.org for describing scientific information
Bioschemas: Using Schema.org for describing scientific information
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
Bioschemas: Introduction and Implementation Study Overview
Bioschemas: Introduction and Implementation Study OverviewBioschemas: Introduction and Implementation Study Overview
Bioschemas: Introduction and Implementation Study Overview
 

Último

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Último (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019

  • 1. Improving discoverability for Life Sciences resources Alasdair J.G. Gray Bioschemas Leadership Team Chair Heriot-Watt University/Elixir-UK Bioschemas ELIXIR All Hands Tutorial Lisbon, Portugal – 19 June 2019
  • 4. Google Dataset Search (Sept 2018) 4 https://toolbox.google.com/datasetsearch http://bioschemas.org https://www.blog.google/products/search/making-it-easier-discover-datasets/
  • 5. Picture: Carole Goble, Turing Lecture 2018 Schema.org: Semantic Markup for the Web
  • 6. Structured data → descriptors ● Types (614) What we can say about those things ● Properties (905) What we are talking about
  • 7. Bioschemas • Community initiative built on top of schema.org • Aim • Improve data discoverability and interoperability in Life Sciences • Approach • Add Life Science types to schema.org • Provide usage guidelines and examples • 6 Minimal properties • Link to domain ontologies • Support software Profile over schema.org Layer of constraints + documentation + extensions Specification Data model Minimum information Controlled vocabularies Cardinality Documentation Examples New (properties | types)
  • 8. Findable Accessible Interoperable Reusable ★Globally unique identifiers ★Community defined enriched metadata ★Indexable by search engines ★JSON-LD/RDFa ★Link to controlled vocabularies ★Links to other resources ★ License ★ Provenance ★Retrievable ★HTTP
  • 9. Schema.org for Datasets Schema definition: ●Dataset: A body of structured information describing some topic(s) of interest http://schema.org/Dataset ●91 properties including: ○name ○description ○isFamilyFriendly 9
  • 10. Google Dataset Profile • 2 required properties • Used for Google Dataset Search • 10 recommended properties • Link to DataCatalog • Link to DataDownload Other profiles: Events, Jobs, ... https://developers.google.com/search/docs/data-types/dataset Google Dataset Profile
  • 11. Compliant with Google Dataset Profile • 5 minimal properties • 8 recommended properties • Link to DataCatalog • Link to DataDownload http://bioschemas.org/specifications/Dataset/ Bioschemas Dataset Profile
  • 12. Extending Schema.org for the Life Sciences 7 release candidates Submission in progress!
  • 13. More types in development
  • 14. 14 Profile Version Group Live Deploys Status notes DataCatalog 0.2 (Jun 2019) Data Repos 20 0.2 fixes minor issues Dataset 0.3 (Jun 2019) Datasets 23 0.3 fixes minor issues Event 0.1 (July 2018) Events 7 Used by TeSS: undergoing revision due to addition of CourseInstance Sample 0.2 (Nov 2018) Samples 1 Taxon 0.3 (Nov 2018) Biodiversity 0 Tool 0.1 (Mar 2018) Tools 5 0.3-DRAFT based on bio.tools profile, needs review TrainingMaterial 0.2 (July 2018) Training 0 Used by TeSS: 0.5-DRAFT incorporating changes from Course Current Bioschemas Profiles
  • 15. Draft Bioschemas Profiles 15 ● Beacon: 0.2-DRAFT 2018-04-23 ● BioSample: 0.1-DRAFT ● ChemicalSubstance: 0.2-DRAFT 2019-06-11 ● Course: 0.6-DRAFT 2019-06-06 ● CourseInstance: 0.6-DRAFT 2019-06-06 ● DNA: 0.1-DRAFT 2018-11-13 ● DataRecord: 0.2-DRAFT 2019-06-14 ● Gene: 0.5-DRAFT 2019-06-14 ● Journal: 0.1-DRAFT 2019-02-08 ● LabProtocol: 0.3-DRAFT 2019-06-14 ● MolecularEntity: 0.2-DRAFT 2019-11-15 ● Organization: 0.1-DRAFT 2018-03-13 ● Person: 0.1-DRAFT 2018-03-14 ● Phenotype: 0.1-DRAFT 2018-11-15 ● Protein: 0.8-DRAFT 2019-05-08 ● ProteinAnnotation: 0.4-DRAFT 2018-02-25 ● ProteinStructure: 0.5-DRAFT 2018-08-15 ● PublicationIssue: 0.1-DRAFT 2019-02-08 ● PublicationVolume: 0.1-DRAFT 2019-02-08 ● ScholarlyArticle: 0.1-DRAFT 2019-02-08 ● SemanticAnnotation: 0.1-DRAFT 2019-02-08 ● Standard: 0.1-DRAFT 2018-01-01 ● Study: 0.1-DRAFT 2018-11-15 ● Tool: 0.3-DRAFT 2018-11-21 ● TrainingMaterial: 0.6-DRAFT 2019-06-06 ● Workflow: 0.1-DRAFT 2019-02-08
  • 16.
  • 17. Mapping ProfileUse cases Mockup Adoption Testing Application Profile Creation Process
  • 18. Bioschemas Software 29 November 2018 http://bioschemas.org 19 Bioschemas Generator ● Supports all profiles ○ Current and draft ● Validates input ● Form generated from YAML description ● Examples extracted from profile
  • 20. TeSS: Specialised Search http://bioschemas.org • contact • description • endDate • eventType • hostInstitution • location • name • startDate • … Bioschemas Event: 29 November 2018 21
  • 21. http://bioschemas.org • description • keywords • name • provider • url Bioschema DataCatalog: • alternateName • citation • dateCreated • licence • … Automated Data Curation
  • 22. Data Exchange: Without an API MarRef → BioSamples https://github.com/EBIBioSamples/bioschemas_marref_demo/blob/master/Summary.md
  • 23. BKG Explorer Built over Bioschemas markup crawled from 30 live deployments 20,000 pages
  • 24. Bioschemas What? • Exploiting schema.org to make Life Sciences resources more discoverable • Search engines will index and understand markup How? • Extending schema.org vocabulary for life sciences • 7 release candidate types • Provide guidelines on how to markup resources
  • 27. http://bioschemas.org/ @bioschemas https://github.com/bioschemas/ Join Bioschemas: http://bioschemas.org/howtojoin/
  • 28. Creating and Deploying Bioschemas Markup Material from: Justin Clark-Casey License: Attribution 4.0 International (CC BY 4.0) Kenneth McLeod
  • 29. Creating Bioschemas markup ● Markup is in a format called JSON-LD ● Embedded directly into webpages ● Let’s look at an example of the DataCatalog schema as used by Bioschemas ○ This comes from schema.org but Bioschemas adds ■ Mandatory/recommended/optional properties ■ Cardinality constraints
  • 30.
  • 31. Markup can be placed in either the head or the body.
  • 32. Let’s look at this in Google’s Structued Data Testing Tool
  • 33.
  • 34.
  • 35. @context is overwritten by Google Technically any prefixes can be defined here, e.g., "@context":["https://schema.org", {"OBI":"http://purl.obolibrary.org/obo/OBI_" ...}], "@type":["Sample","OBI:0000747"] … BUT, Google will overwrite this with the basic "@context": "http://schema.org"
  • 36. @id - gives a node a URL Without @id there are auto-generated URLs for nodes, e.g., <script type="application/ld+json">{ "@context" : "https://schema.org", "@type" : "DataCatalog", ... becomes: _:genid2d4335ed7c72694275bea5b6a86ad9f82b2db0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> . Bad for Linked Data as no one can reference this.
  • 37. @id - gives a node a URL With an@id you choose the URL for nodes, e.g., <script type="application/ld+json">{ "@context" : "https://schema.org", "@type" : "DataCatalog", "@id" : "https://www.ebi.ac.uk/biosamples" … becomes: <https://www.ebi.ac.uk/biosamples> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <https://schema.org/DataCatalog> .
  • 38. Warning! Don’t use the same @id for everything DataCatalog & Dataset defined separately, but combined into a single entity:
  • 39. GSDTT common errors If you don’t meet Google’s desired property specification for a given type you see errors like: If Bioschemas spec says this is OK, you can ignore error (FYI it is a real error) Not min properties in Bioschemas; do what you want This error is caused by the incorrect target type of location. Description is min property for Bioschemas (ie mandatory)
  • 40. Bioschema’s Types not yet accepted by Schema: Ignore these
  • 45. Evolving Best Practices ● At the moment we largely create markup by hand with validation through Google’s testing tool ○ More validators and tools on the way, see bioschemas.org/tools ● Make pages with markup reachable from your sitemap.xml ○ This will make it easier for some applications to find it. ● Avoid adding Bioschemas markup to the page dynamically (e.g. through Javascript) ○ Applications trying to find your data may not have the resources to render pages. ● Specify an @id ● Evolving guidance at https://github.com/BioSchemas/specifications/wiki/Technical
  • 46. Questions? ● bioschemas.org ● bioschemas.org/groups/Technical ● https://bioschemas.org/software/ ● Google Structured Data Testing Tool ● kcm1@hw.ac.uk