This tutorial will introduce you to Bioschemas, and will present how to include schema.org markup to make your resource(s) more discoverable on the Web. At the end of the session, attendees will be able to
1) Understand what is Bioschemas and how to use it
2) Have examples of deployments using Bioschemas
Make your Web resources more discoverable with Bioschemas markup –Bioschemas Tutorial June 2019
1. Improving discoverability for Life
Sciences resources
Alasdair J.G. Gray
Bioschemas Leadership Team Chair
Heriot-Watt University/Elixir-UK
Bioschemas
ELIXIR All Hands Tutorial
Lisbon, Portugal – 19 June 2019
6. Structured data → descriptors
● Types
(614)
What we can
say about
those things
● Properties
(905)
What we are
talking about
7. Bioschemas
• Community initiative built on top of
schema.org
• Aim
• Improve data discoverability and
interoperability in Life Sciences
• Approach
• Add Life Science types to schema.org
• Provide usage guidelines and examples
• 6 Minimal properties
• Link to domain ontologies
• Support software
Profile over schema.org
Layer of constraints + documentation +
extensions Specification
Data model
Minimum information
Controlled vocabularies
Cardinality
Documentation
Examples
New (properties | types)
8. Findable Accessible Interoperable Reusable
★Globally unique
identifiers
★Community
defined enriched
metadata
★Indexable by
search engines
★JSON-LD/RDFa
★Link to
controlled
vocabularies
★Links to other
resources
★ License
★ Provenance
★Retrievable
★HTTP
9. Schema.org for Datasets
Schema definition:
●Dataset: A body of structured
information describing some
topic(s) of interest
http://schema.org/Dataset
●91 properties including:
○name
○description
○isFamilyFriendly
9
10. Google Dataset Profile
• 2 required properties
• Used for Google Dataset Search
• 10 recommended properties
• Link to DataCatalog
• Link to DataDownload
Other profiles: Events, Jobs,
...
https://developers.google.com/search/docs/data-types/dataset
Google Dataset Profile
11. Compliant with Google
Dataset Profile
• 5 minimal properties
• 8 recommended properties
• Link to DataCatalog
• Link to DataDownload
http://bioschemas.org/specifications/Dataset/
Bioschemas Dataset Profile
14. 14
Profile Version Group Live Deploys Status notes
DataCatalog 0.2 (Jun 2019) Data Repos 20 0.2 fixes minor issues
Dataset 0.3 (Jun 2019) Datasets 23 0.3 fixes minor issues
Event 0.1 (July 2018) Events 7 Used by TeSS: undergoing revision due to addition of CourseInstance
Sample 0.2 (Nov 2018) Samples 1
Taxon 0.3 (Nov 2018) Biodiversity 0
Tool 0.1 (Mar 2018) Tools 5 0.3-DRAFT based on bio.tools profile, needs review
TrainingMaterial 0.2 (July 2018) Training 0 Used by TeSS: 0.5-DRAFT incorporating changes from Course
Current Bioschemas Profiles
18. Bioschemas Software
29 November 2018 http://bioschemas.org 19
Bioschemas Generator
● Supports all profiles
○ Current and draft
● Validates input
● Form generated from
YAML description
● Examples extracted from
profile
24. Bioschemas
What?
• Exploiting schema.org to make Life Sciences
resources more discoverable
• Search engines will index and understand
markup
How?
• Extending schema.org vocabulary for life
sciences
• 7 release candidate types
• Provide guidelines on how to markup
resources
28. Creating and Deploying
Bioschemas Markup
Material from: Justin Clark-Casey
License: Attribution 4.0 International (CC BY 4.0)
Kenneth McLeod
29. Creating Bioschemas markup
● Markup is in a format called JSON-LD
● Embedded directly into webpages
● Let’s look at an example of the DataCatalog schema as used by Bioschemas
○ This comes from schema.org but Bioschemas adds
■ Mandatory/recommended/optional properties
■ Cardinality constraints
30.
31. Markup can be placed in either the
head or the body.
32. Let’s look at this in Google’s Structued
Data Testing Tool
33.
34.
35. @context is overwritten by Google
Technically any prefixes can be defined here, e.g.,
"@context":["https://schema.org", {"OBI":"http://purl.obolibrary.org/obo/OBI_" ...}],
"@type":["Sample","OBI:0000747"] …
BUT, Google will overwrite this with the basic "@context": "http://schema.org"
36. @id - gives a node a URL
Without @id there are auto-generated URLs for nodes, e.g.,
<script type="application/ld+json">{
"@context" : "https://schema.org",
"@type" : "DataCatalog", ...
becomes:
_:genid2d4335ed7c72694275bea5b6a86ad9f82b2db0
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<https://schema.org/DataCatalog> .
Bad for Linked Data as no one can reference this.
37. @id - gives a node a URL
With an@id you choose the URL for nodes, e.g.,
<script type="application/ld+json">{
"@context" : "https://schema.org",
"@type" : "DataCatalog",
"@id" : "https://www.ebi.ac.uk/biosamples" …
becomes:
<https://www.ebi.ac.uk/biosamples> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <https://schema.org/DataCatalog> .
38. Warning! Don’t use the same @id for everything
DataCatalog & Dataset defined separately, but combined into a single entity:
39. GSDTT
common errors
If you don’t meet
Google’s
desired property
specification for
a given type you see
errors like:
If Bioschemas spec says this is OK, you can
ignore error (FYI it is a real error)
Not min properties in
Bioschemas; do what you
want
This error is caused by the
incorrect target type of location.
Description is min property
for Bioschemas (ie
mandatory)
45. Evolving Best Practices
● At the moment we largely create markup by hand with validation through
Google’s testing tool
○ More validators and tools on the way, see bioschemas.org/tools
● Make pages with markup reachable from your sitemap.xml
○ This will make it easier for some applications to find it.
● Avoid adding Bioschemas markup to the page dynamically (e.g. through
Javascript)
○ Applications trying to find your data may not have the resources to render pages.
● Specify an @id
● Evolving guidance at
https://github.com/BioSchemas/specifications/wiki/Technical