Presentation about Training materials and events specifications at the Bioschemas Adoption Meeting on 2nd October 2017. Presentation given by Niall Beard. Shows the aggregation use case of Bioschemas - showing adoption of the schema.org and a new tool for importing schema.org annotations
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
Bioschemas Adoption Meeting: Training materials and Events
1. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Bioschemas Adoption meeting
Events and Training Materials Schema
(Aggregation use case + New importer tools for TeSS)
Niall Beard
ELIXIR-TeSS Project Manager
University of Manchester
2. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Bioschemas and TeSS
• TeSS aggregates links + metadata about Life
science events and training materials and displays
them in a portal.
• To aggregate - TeSS team write scrapers to parse
HTML, RSS feeds, ICS calendars, APIs – anything
structured - and add to TeSS
• We created one of the very first Bioschemas
specifications! To allow content providers to be
included in the registry more quickly and reliably
by adding simple structured data to their site.
3. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Bioschemas and TeSS
• We’ve had lots of Bioschemas adoption by
training content providers.
• Formats of programmatically ingested training
resource feeds in TeSS:
• ICS file format 5
• RSS feed 2
• HTML scraping 3
• Bioschemas 14
• GitHub YAML 2
• Read API 4
4. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Semi-Automated Bioschemas Importer
• Introducing a new tool in TeSS. Not in
production yet
• Automatically parses any schema.org/Event or
schema.org/CreativeWork resources ready for
inclusion in TeSS
5. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Create a new content provider or select an existing one
6. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Input URL of webpage containing training events or materials that are:
a) annotated in schema.org JSON-LD or RDFA format or
b) in an .ics file format
7. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
The importer downloads the page, extracts the metadata of any events of
training materials, and displays it in a ‘staging area’
8. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Use the ‘details’ button to see all the extracted metadata about each
resource
9. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
The ‘staging area’ alerts the importer to resources already existing inTeSS
12. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Issues
• Formatting + Data at source page must be perfect for
this to be effective
– We usually write custom scrapers that download a
page, extract the metadata and save.
– This gives us the flexibility to write custom
modifications to data
• Normalization: GB -> United Kingdom
• Supplement: Lookup Lat/Lon of event based on street
address
• Correction: Fix formatting issues
• Clear visualization of what data can be extracted. Could
be a useful tool for helping providers annotate with
Bioschemas – then create a custom scraper with mods
13. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Issues
• Can’t discover new resources
– TeSS scrapers are run each night and any new
resources discovered are added to TeSS
– The importer tool forces users to make explicit
choices about which resources should be added
– Having an ‘Add anything you find’ creation policy can
lead to issues. Lots of schema.org generated within
CMSs that might be unintended
• We could send manual prompts to content
provider owners asking to confirm whether newly
discovered resources should be added
14. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Training Resource Schemas Review
• GOBLET AGM, Portugal
– Thursday 23rd November 2017 – Full day
– Review Events and Training materials
specifications:
• Refine mandatory terms
• Improve coverage
• Adapt to current best practices with input from training
community
• In collaboration with GOBLETs Standards Committee
• Will publish report of activity.
http://mygoblet.org/about-us/goblet-
events/goblet-agm-2017-oeiras
15. Bioschemas Implementation Study Adoption Meeting | Hinxton, Cambridge
Thanks!
• Any Questions?
• Slides are available on Slideshare:
• https://slideshare.net/NiallBeard
• Visit TeSS at:
• https://tess.elixir-europe.org