SlideShare uma empresa Scribd logo
1 de 69
Bioschemas Workshop
Niall Beard
Bioinformatics Education Summit
13th May 2019
Preliminary Agenda
Expected Learning Outcomes
• Understand what schema.org is and how it can be applied to a
project
• Understand what Bioschemas is, how it differs from
schema.org, and what vocabularies are available
• Know the benefits and limitations to using schema.org
• Gain an understanding of how to apply (bio/)schema.org to
your site.
Workshop style
• Please do interrupt me if:
– You have any questions
– If you have difficulty reading the slides
– If I’m not speaking clearly enough
– Or if I am going to fast/slow
What is…
Search Engines
User InformationConnect
Search Engines
User InformationConnect
Query text
Demographic
Location
Device Type
Document content
Web traffic
Link count
Freshness
----
21 ‘signals’
Search Engines
User InformationConnect
Query text
Demographic
Location
Device Type
Document content
Web traffic
Link count
Freshness
----
21 ‘signals’
Algorithms to guess
matches
????????
Text Matching
Named Entity Recog
TF-IDF
NLP
Take out some of the guesswork…
• Search engines need to predict what a page is
about…
• What if instead, search engines allow the
information providers to explicitly define their
pages contents
• Rather than relying on algorithmic guesswork!
Slide courtesy of Alasdair Gray
Schema.org
• A lightweight way of structuring data online
• Created by a consortium of search engines to improve
experience and search efficacy
•Thousands of different vocabularies to describe information
online
Metadata model
ie. Recipe type
<div itemscope itemtype="http://schema.org/Recipe">
<div itemprop="nutrition” itemscope
itemtype="http://schema.org/NutritionInformation">
Nutrition facts:
<span itemprop="calories">144 kcal</span>,
</div>
Ingredients:
- <span itemprop="recipeIngredient">800g small new potato</span>
- <span itemprop="recipeIngredient">3 shallot</span>
. . .
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": ”Recipe",
"name": ”Potato Salad",
“NutritionInformation”: {
"calories”: “144 kcal”,
"recipeIngredient”: “800g small new potato”,
"recipeIngredient”: “3 shallot”
. . .
Readable by search engines
Content Content Content
Schema.org Schema.org Schema.org
A training event – marked up in schema.org
– as shown by Google
https://search.google.com/structured-data/testing-tool
https://toolbox.google
m/datasetsearch
Search engines favour websites
containing schema.org in their search
results
Readable by Registries
Resource Resource Resource
Schema.org Schema.org Schema.org
Schema.org is community made
• Schema.org is made up of decentralized
extensions from different industries
Schema.org is community made
• Extensions that see good usage get ‘folded-in’
to the core schema.org vocabularies
Schema.org is community made
• To take advantage of schema.org for
Bioinformatics, we need to make our own
community
Bioinformatics
/ Life science
Community
Part 2
Bioschemas
See; “The FAIR Guiding Principles for scientific data management and stewardship”,
Mark D Wilkinson et al, 2016
Schema.org is community made
• … Bioschemas is a community to propose Life
science specifications to schema.org
Bioinformatics
/ Life science
Community
Bioschemas
• Bioschemas is a community project which;
– Creates Types for Life science resources
• Proteins, Samples, Beacons, Tools, Training, etc
– Create Profiles to Refine & Enhance Types
• Marginality
• Cardinality
• Controlled Vocabularies
– Creates tools to make bioschemas easier to
create, validate, and extract
Types
• Types = New vocabularies to propose to schema.org
– Some are Biological Types
– Some are Generic Types that are
useful to Life scientists
– These new types will be hosted at
bio.schema.org
– Currently at:
http://bio.sdo-bioschemas-227516.appspot.com
Biological Types
http://bio.sdo-bioschemas-227516.appspot.com/BioChemEntity
Profiles
• Profiles = Refinement & Interoperability Layer
- Because every industry and domain shares
in these specifications…
- Every domain includes its own properties
- So we inherit lots of properties we don’t
care about
Schema.org is messy!
Profiles - Tidying up Schema.org
• For example;
– Dataset inherits from schema.org/CreativeWork
– CreativeWork (and therefore Dataset) contains
properties for:
• Character
• IsFamilyFriendly
• Material (e.g. leather, wool, cotton, paper)
• Genre
• Bioschemas offers an indication of how relevant /
recommended each property is, by grouping into
• Minimum | Recommended | Optional
Profiles
• Profiles = Refinement & Interoperability Layer
- schema.orgs generality means it does not
recommend which ontologies to annotate
with
- Lack of restrictions on cardinality make it
difficult to parse the data (if you’re not a
huge search engine)
Schema.org is not great for interoperability!
Profiles - Improving interoperability
• Bioschemas profiles include cardinality
restrictions and controlled vocabularies
tailored to our use-cases
Profiles and their adoption
Profile Development process
• Determining the schema is a process of
empirical surveying and expert opinion.
• We do a Cross-walk to find what fields are
missing and use this to gauge marginality
Profile Development process
Should it be
Minimum /
Optional /
Recommended
Should there
be one or
many of them?
Should values
be restricted
to a controlled
vocab?
If we already
have it:
Do we want to
keep it?
Agree on answers
for each of these
questions
Go through each
attribute (row) of
the schema
If we don’t
have it:
Do we want to
include it?
Column G
Column G Column H Column I
Is the
description
provided okay?
Do we want to
rewrite it?
Column F
• Discussions through our public mailing list
Profile Development process
Profile Development process
We use Github to request new properties,
identify and manage bug fixing, and publicly
present our decision making
Case Study
TeSS:
Training materials,
Events, and Courses
Part 4
ELIXIR All Hands 2018, June 2018, Berlin, Germany
The ELIXIR Training Portal - TeSS
https://tess.elixir-europe.org
TeSS
• A training portal that indexes metadata from across the
web.
•Presents a wide selection of openly available training
resources across the bioinformatics discipline.
•Displays these in a navigable – easy-to-find manner; in a
feature rich environment.
View upcoming events of interest
https://tess.elixir-europe.org/events
Find training materials from around the Web
https://tess.elixir-europe.org/materials
TeSS Features
Search and
Filter
Institutional Login Events
• 270+ Upcoming events
• 800+Training materials
• Filter with 10+
different facets
Login with ELIXIRAAI using
your institutional or Google
credentials with 1-click sign-
on, to:
• Favourite resources
• Add new events &
materials
• Create new training
workflows
Stay informed about
upcoming events of
interest
• E-mail subscription
• Import into
calendar
applications
TeSS Features
Link with other
registries
Ontological
Classification
Events map
• Training events and
materials can be linked
with resources from
other registries.
BioportalAnnotatorWeb
service predicts topics of
resources added toTeSS.
These can be
approved/rejected easily by
our curation group
View filtered events
plotted on a map to
find the most
accessible & relevant
events
Tools & Data services
from bio.tools
Databases, standard,
& policies
from fairsharing.org
Content sourcing
• Rely on community to register resources?
• Community needs to be moderated (to avoid spammers)
• Hard to get critical mass of community involvement
• Rely on curators to enter content?
• Curators need to be paid / incentivized
• Data entry is boring
• A drop in curation/moderation attention can lead to inaccurate,
malevolent, or insufficient content
• Instead develop a solution that
• Takes metadata directly from sources
• Adds any resources to TeSS as they appear
• Updates any resources that have changed
How TeSS works
Front End
Automated
Aggregator
Custom Scraper
Custom Scraper
Custom Scraper
Extract metadata
from training
material and
events pages
Back End
Metadata
Catalogue
Events
Materials
Workflows
Finds relevant resources
Training
Workflows
Search
Interface
Workflow
Viewer
Online Training Resources
User enters form
data
•There are several techniques we can use to extract metadata
from content provider websites. This depends on what’s on
the site.
•Interface with an API
• Handy but rare, difficult for websites to implement
Content aggregators must write bespoke API Client for each
• Structured data already embedded in page (RSS, ICS)
• Limited amount of data
•HTML Scraping
• Fragile technique that can break when there are changes to the
website.
Automatic extraction techniques
Trade-off between ease of adopting
and usefulness to aggregators
Ease to
implement
on a website
Usefulness to aggregator
Content Provider extraction technique
statistics
Events Materials Total
Schema.org /
Bioschemas
9 6 15
HTML 3 5 8
XML/JSON/YAML/CS
V
4 3 7
iCal 5 -- 5
JSON API -- 2 2
RSS 1 -- 1
Total 38
Content aggregation via Bioschemas
Front End
Automated
Aggregator
Schema.orgScrape
Custom Scraper
Custom Scraper
Extract metadata
from training
material and
events pages
Back End
Metadata
Catalogue
Events
Materials
Workflows
Finds relevant resources
Training
Workflows
Search
Interface
Workflow
Viewer
Online Training Resources
Tools and
Techniques for
Implementation
Part 4
Technique for adding Bioschemas to a
website
• 1. Identify an
appropriate schema(s)
for your content type
• 1.a If it doesn’t exist,
e-mail the mailing list
(W3C, or add to
Github Issue tracker)
Issue tracker
https://github.com/BioSch
emas/specifications
Mailing List
https://www.w3.org/co
mmunity/bioschemas/
Technique for adding Bioschemas to a
website
• 2. Draw a table and
write down your
metadata fields on the
left hand side and the
schema.org properties
on the right.
• Map the ones that
correlate
Technique for adding Bioschemas to a
website
• 3a. Use the Bioschemas
generator to create a
JSON-LD snippet that
you can (hopefully)
copy and paste into
your site. (This would
mean creating one for
every new schema.org
record you want to add)
http://www.macs.hw.ac.uk/SWeL/BioschemasGenerator/
Technique for adding Bioschemas to a
website
• 3b. If you can modify
your site, paste in the
JSON-LD template of
the schema (from 3a),
and render the
metadata variables as
values to the keys
Mapping
Technique for adding Bioschemas to a
website
• 3c. If your site is using
a CMS such as
Wordpress or Drupal,
explore whether there
is an appropriate
schema.org plugins
you can use (or ask on
the bioschemas
mailing list)
Tutorials
• Bioschemas Training Portal
– There is a step-by-step tutorial on there
for adding schema.org to jekyll pages /
github page sites.
– Hopefully there will be more to come
https://bioschemas.gitbook.io/training-portal
Tools
• Bioschemas Generator
– Form-based tool to generate valid Bioschemas
JSON-LD
– http://www.macs.hw.ac.uk/SWeL/BioschemasGener
ator/
• Validata [under construction]
– Web application for validating bioschemas markup
https://bioschemas.org/software/
Tools
• GoCrawlt
– JSON-LD schema.org extractor
• Buzzbang [on hold]
– Search engine that crawls the web for Bioschemas
JSON-LD
https://bioschemas.org/software/
Freebies from Schema.org
• Google Search Console
– Shows you what schema.org data Google is picking
up from your site, any errors, and advice on how to
fix them
– https://search.google.com/search-console
Freebies from Schema.org
• Google Structured Data Testing Tool
– Extracts the schema.org from a given web-page or
from a code-snippet, validates it, and shows you
what errors there are
– https://search.google.com/structured-data/testing-
tool
Freebies from Schema.org ecosystem
• 3rd party plug-ins
– Lots available to help
add schema.org to your
framework
Slide courtesy of Alasdair Gray

Mais conteĂşdo relacionado

Mais procurados

3 - Discovery-systems
3  - Discovery-systems3  - Discovery-systems
3 - Discovery-systemsWilliam Helling
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Best Practices in Managing e-resources
Best Practices in Managing e-resourcesBest Practices in Managing e-resources
Best Practices in Managing e-resourcesslimkm
 
Researh data management
Researh data managementResearh data management
Researh data managementNikesh Narayanan
 
4 things about discovery
4 things about discovery4 things about discovery
4 things about discoverylisld
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...butest
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudPeter Haase
 
Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...Kenning Arlitsch
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Martin Voigt
 
DLF ILS Discovery Interface Task Force API recommendation
DLF ILS Discovery Interface Task Force API recommendationDLF ILS Discovery Interface Task Force API recommendation
DLF ILS Discovery Interface Task Force API recommendationeby
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...semanticsconference
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTVikas Bhushan
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterJaroslaw Dobrzanski
 
Automated metadata creation - Possibilities and pitfalls
Automated metadata creation - Possibilities and pitfallsAutomated metadata creation - Possibilities and pitfalls
Automated metadata creation - Possibilities and pitfallsNASIG
 

Mais procurados (15)

3 - Discovery-systems
3  - Discovery-systems3  - Discovery-systems
3 - Discovery-systems
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Best Practices in Managing e-resources
Best Practices in Managing e-resourcesBest Practices in Managing e-resources
Best Practices in Managing e-resources
 
Researh data management
Researh data managementResearh data management
Researh data management
 
4 things about discovery
4 things about discovery4 things about discovery
4 things about discovery
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...Improving Visibility in Search Engines: How collections and organizations ben...
Improving Visibility in Search Engines: How collections and organizations ben...
 
I Don’t Have Time for Metadata!
I Don’t Have Time for Metadata!I Don’t Have Time for Metadata!
I Don’t Have Time for Metadata!
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
DLF ILS Discovery Interface Task Force API recommendation
DLF ILS Discovery Interface Task Force API recommendationDLF ILS Discovery Interface Task Force API recommendation
DLF ILS Discovery Interface Task Force API recommendation
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
 
IKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge HarvesterIKHarvester - Informal Knowledge Harvester
IKHarvester - Informal Knowledge Harvester
 
Automated metadata creation - Possibilities and pitfalls
Automated metadata creation - Possibilities and pitfallsAutomated metadata creation - Possibilities and pitfalls
Automated metadata creation - Possibilities and pitfalls
 

Semelhante a Bioschemas Workshop

Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyAccess Innovations, Inc.
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of contentNikos Manouselis
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance modelYash Goley
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich SnippetsPlus91 Technologies Pvt. Ltd.
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Benchmarking Your Search Function l.ppt
Benchmarking Your Search Function l.pptBenchmarking Your Search Function l.ppt
Benchmarking Your Search Function l.pptDeepak Nagar
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationConcept Searching, Inc
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentAcquia
 
DU Series - Day 4.pptx
DU Series - Day 4.pptxDU Series - Day 4.pptx
DU Series - Day 4.pptxUiPathCommunity
 
Getting started with with SharePoint Syntex
Getting started with with SharePoint SyntexGetting started with with SharePoint Syntex
Getting started with with SharePoint SyntexDrew Madelung
 
Apache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectApache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectsakthibalabalamuruga
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
Structuring Serendipitous Collaboration
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous CollaborationNick Inglis
 

Semelhante a Bioschemas Workshop (20)

Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content Automatically
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
SharePoint 2013 governance model
SharePoint 2013 governance modelSharePoint 2013 governance model
SharePoint 2013 governance model
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Introduction to Microdata & Google Rich Snippets
Introduction to Microdata  & Google Rich SnippetsIntroduction to Microdata  & Google Rich Snippets
Introduction to Microdata & Google Rich Snippets
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Benchmarking Your Search Function l.ppt
Benchmarking Your Search Function l.pptBenchmarking Your Search Function l.ppt
Benchmarking Your Search Function l.ppt
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago Presentation
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured Content
 
DU Series - Day 4.pptx
DU Series - Day 4.pptxDU Series - Day 4.pptx
DU Series - Day 4.pptx
 
Semantic SharePoint
Semantic SharePointSemantic SharePoint
Semantic SharePoint
 
Getting started with with SharePoint Syntex
Getting started with with SharePoint SyntexGetting started with with SharePoint Syntex
Getting started with with SharePoint Syntex
 
Apache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectApache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobject
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Structuring Serendipitous Collaboration
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous Collaboration
 

Mais de Niall Beard

Concept Maps in TeSS
Concept Maps in TeSSConcept Maps in TeSS
Concept Maps in TeSSNiall Beard
 
Bioschemas Adoption Meeting: Training materials and Events
Bioschemas Adoption Meeting: Training materials and EventsBioschemas Adoption Meeting: Training materials and Events
Bioschemas Adoption Meeting: Training materials and EventsNiall Beard
 
TeSS @ ISMB/ECCB 2017, Prague
TeSS @ ISMB/ECCB 2017, PragueTeSS @ ISMB/ECCB 2017, Prague
TeSS @ ISMB/ECCB 2017, PragueNiall Beard
 
schema.org - Simple Structured Data for the Web
schema.org - Simple Structured Data for the Webschema.org - Simple Structured Data for the Web
schema.org - Simple Structured Data for the WebNiall Beard
 
TeSS ELIXIR All Hands Rome 2017
TeSS ELIXIR All Hands Rome 2017TeSS ELIXIR All Hands Rome 2017
TeSS ELIXIR All Hands Rome 2017Niall Beard
 
ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSNiall Beard
 
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool Niall Beard
 
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)TeSS: ELIXIR Training Portal (Eubic Winter School 2017)
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)Niall Beard
 
Bioschemas for Aggregating ELIXIR Events - Comms Webinar
Bioschemas for Aggregating ELIXIR Events - Comms WebinarBioschemas for Aggregating ELIXIR Events - Comms Webinar
Bioschemas for Aggregating ELIXIR Events - Comms WebinarNiall Beard
 
TeSS trcg meeting nov16
TeSS trcg meeting nov16TeSS trcg meeting nov16
TeSS trcg meeting nov16Niall Beard
 
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016Niall Beard
 
Bioschemas presentation at ECCB 2016, The Hague
Bioschemas presentation at ECCB 2016, The HagueBioschemas presentation at ECCB 2016, The Hague
Bioschemas presentation at ECCB 2016, The HagueNiall Beard
 
ISMB BioSchemas Presentation
ISMB BioSchemas PresentationISMB BioSchemas Presentation
ISMB BioSchemas PresentationNiall Beard
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshopNiall Beard
 
Lightningtalk BioSchemas
Lightningtalk BioSchemasLightningtalk BioSchemas
Lightningtalk BioSchemasNiall Beard
 
TeSS Lightning Talk - cw16
TeSS Lightning Talk - cw16TeSS Lightning Talk - cw16
TeSS Lightning Talk - cw16Niall Beard
 
TeSS training eSupport System
TeSS training eSupport SystemTeSS training eSupport System
TeSS training eSupport SystemNiall Beard
 
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015The Biodiversity Catalogue and support for Web Map Services - TDWG 2015
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015Niall Beard
 

Mais de Niall Beard (18)

Concept Maps in TeSS
Concept Maps in TeSSConcept Maps in TeSS
Concept Maps in TeSS
 
Bioschemas Adoption Meeting: Training materials and Events
Bioschemas Adoption Meeting: Training materials and EventsBioschemas Adoption Meeting: Training materials and Events
Bioschemas Adoption Meeting: Training materials and Events
 
TeSS @ ISMB/ECCB 2017, Prague
TeSS @ ISMB/ECCB 2017, PragueTeSS @ ISMB/ECCB 2017, Prague
TeSS @ ISMB/ECCB 2017, Prague
 
schema.org - Simple Structured Data for the Web
schema.org - Simple Structured Data for the Webschema.org - Simple Structured Data for the Web
schema.org - Simple Structured Data for the Web
 
TeSS ELIXIR All Hands Rome 2017
TeSS ELIXIR All Hands Rome 2017TeSS ELIXIR All Hands Rome 2017
TeSS ELIXIR All Hands Rome 2017
 
ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSS
 
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool
ELIXIR TeSS And Bioschemas: An aggregated portal and an aggregation tool
 
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)TeSS: ELIXIR Training Portal (Eubic Winter School 2017)
TeSS: ELIXIR Training Portal (Eubic Winter School 2017)
 
Bioschemas for Aggregating ELIXIR Events - Comms Webinar
Bioschemas for Aggregating ELIXIR Events - Comms WebinarBioschemas for Aggregating ELIXIR Events - Comms Webinar
Bioschemas for Aggregating ELIXIR Events - Comms Webinar
 
TeSS trcg meeting nov16
TeSS trcg meeting nov16TeSS trcg meeting nov16
TeSS trcg meeting nov16
 
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016
Bioschemas - TeSS Integration @ Rothamsted Hackathon 2016
 
Bioschemas presentation at ECCB 2016, The Hague
Bioschemas presentation at ECCB 2016, The HagueBioschemas presentation at ECCB 2016, The Hague
Bioschemas presentation at ECCB 2016, The Hague
 
ISMB BioSchemas Presentation
ISMB BioSchemas PresentationISMB BioSchemas Presentation
ISMB BioSchemas Presentation
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 
Lightningtalk BioSchemas
Lightningtalk BioSchemasLightningtalk BioSchemas
Lightningtalk BioSchemas
 
TeSS Lightning Talk - cw16
TeSS Lightning Talk - cw16TeSS Lightning Talk - cw16
TeSS Lightning Talk - cw16
 
TeSS training eSupport System
TeSS training eSupport SystemTeSS training eSupport System
TeSS training eSupport System
 
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015The Biodiversity Catalogue and support for Web Map Services - TDWG 2015
The Biodiversity Catalogue and support for Web Map Services - TDWG 2015
 

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Bioschemas Workshop

  • 1. Bioschemas Workshop Niall Beard Bioinformatics Education Summit 13th May 2019
  • 3. Expected Learning Outcomes • Understand what schema.org is and how it can be applied to a project • Understand what Bioschemas is, how it differs from schema.org, and what vocabularies are available • Know the benefits and limitations to using schema.org • Gain an understanding of how to apply (bio/)schema.org to your site.
  • 4. Workshop style • Please do interrupt me if: – You have any questions – If you have difficulty reading the slides – If I’m not speaking clearly enough – Or if I am going to fast/slow
  • 7. Search Engines User InformationConnect Query text Demographic Location Device Type Document content Web traffic Link count Freshness ---- 21 ‘signals’
  • 8. Search Engines User InformationConnect Query text Demographic Location Device Type Document content Web traffic Link count Freshness ---- 21 ‘signals’ Algorithms to guess matches ???????? Text Matching Named Entity Recog TF-IDF NLP
  • 9. Take out some of the guesswork… • Search engines need to predict what a page is about… • What if instead, search engines allow the information providers to explicitly define their pages contents • Rather than relying on algorithmic guesswork!
  • 10. Slide courtesy of Alasdair Gray
  • 11. Schema.org • A lightweight way of structuring data online • Created by a consortium of search engines to improve experience and search efficacy •Thousands of different vocabularies to describe information online
  • 13.
  • 14. <div itemscope itemtype="http://schema.org/Recipe"> <div itemprop="nutrition” itemscope itemtype="http://schema.org/NutritionInformation"> Nutrition facts: <span itemprop="calories">144 kcal</span>, </div> Ingredients: - <span itemprop="recipeIngredient">800g small new potato</span> - <span itemprop="recipeIngredient">3 shallot</span> . . .
  • 15. <script type="application/ld+json"> { "@context": "http://schema.org", "@type": ”Recipe", "name": ”Potato Salad", “NutritionInformation”: { "calories”: “144 kcal”, "recipeIngredient”: “800g small new potato”, "recipeIngredient”: “3 shallot” . . .
  • 16. Readable by search engines Content Content Content Schema.org Schema.org Schema.org
  • 17.
  • 18.
  • 19. A training event – marked up in schema.org – as shown by Google https://search.google.com/structured-data/testing-tool
  • 21. Search engines favour websites containing schema.org in their search results
  • 22. Readable by Registries Resource Resource Resource Schema.org Schema.org Schema.org
  • 23. Schema.org is community made • Schema.org is made up of decentralized extensions from different industries
  • 24. Schema.org is community made • Extensions that see good usage get ‘folded-in’ to the core schema.org vocabularies
  • 25. Schema.org is community made • To take advantage of schema.org for Bioinformatics, we need to make our own community Bioinformatics / Life science Community
  • 27. Bioschemas See; “The FAIR Guiding Principles for scientific data management and stewardship”, Mark D Wilkinson et al, 2016
  • 28. Schema.org is community made • … Bioschemas is a community to propose Life science specifications to schema.org Bioinformatics / Life science Community
  • 29. Bioschemas • Bioschemas is a community project which; – Creates Types for Life science resources • Proteins, Samples, Beacons, Tools, Training, etc – Create Profiles to Refine & Enhance Types • Marginality • Cardinality • Controlled Vocabularies – Creates tools to make bioschemas easier to create, validate, and extract
  • 30. Types • Types = New vocabularies to propose to schema.org – Some are Biological Types – Some are Generic Types that are useful to Life scientists – These new types will be hosted at bio.schema.org – Currently at: http://bio.sdo-bioschemas-227516.appspot.com
  • 32.
  • 34. Profiles • Profiles = Refinement & Interoperability Layer - Because every industry and domain shares in these specifications… - Every domain includes its own properties - So we inherit lots of properties we don’t care about Schema.org is messy!
  • 35. Profiles - Tidying up Schema.org • For example; – Dataset inherits from schema.org/CreativeWork – CreativeWork (and therefore Dataset) contains properties for: • Character • IsFamilyFriendly • Material (e.g. leather, wool, cotton, paper) • Genre • Bioschemas offers an indication of how relevant / recommended each property is, by grouping into • Minimum | Recommended | Optional
  • 36. Profiles • Profiles = Refinement & Interoperability Layer - schema.orgs generality means it does not recommend which ontologies to annotate with - Lack of restrictions on cardinality make it difficult to parse the data (if you’re not a huge search engine) Schema.org is not great for interoperability!
  • 37. Profiles - Improving interoperability • Bioschemas profiles include cardinality restrictions and controlled vocabularies tailored to our use-cases
  • 38. Profiles and their adoption
  • 39. Profile Development process • Determining the schema is a process of empirical surveying and expert opinion. • We do a Cross-walk to find what fields are missing and use this to gauge marginality
  • 40. Profile Development process Should it be Minimum / Optional / Recommended Should there be one or many of them? Should values be restricted to a controlled vocab? If we already have it: Do we want to keep it? Agree on answers for each of these questions Go through each attribute (row) of the schema If we don’t have it: Do we want to include it? Column G Column G Column H Column I Is the description provided okay? Do we want to rewrite it? Column F
  • 41. • Discussions through our public mailing list Profile Development process
  • 42. Profile Development process We use Github to request new properties, identify and manage bug fixing, and publicly present our decision making
  • 44. ELIXIR All Hands 2018, June 2018, Berlin, Germany
  • 45. The ELIXIR Training Portal - TeSS https://tess.elixir-europe.org
  • 46. TeSS • A training portal that indexes metadata from across the web. •Presents a wide selection of openly available training resources across the bioinformatics discipline. •Displays these in a navigable – easy-to-find manner; in a feature rich environment.
  • 47. View upcoming events of interest https://tess.elixir-europe.org/events
  • 48. Find training materials from around the Web https://tess.elixir-europe.org/materials
  • 49. TeSS Features Search and Filter Institutional Login Events • 270+ Upcoming events • 800+Training materials • Filter with 10+ different facets Login with ELIXIRAAI using your institutional or Google credentials with 1-click sign- on, to: • Favourite resources • Add new events & materials • Create new training workflows Stay informed about upcoming events of interest • E-mail subscription • Import into calendar applications
  • 50. TeSS Features Link with other registries Ontological Classification Events map • Training events and materials can be linked with resources from other registries. BioportalAnnotatorWeb service predicts topics of resources added toTeSS. These can be approved/rejected easily by our curation group View filtered events plotted on a map to find the most accessible & relevant events Tools & Data services from bio.tools Databases, standard, & policies from fairsharing.org
  • 51. Content sourcing • Rely on community to register resources? • Community needs to be moderated (to avoid spammers) • Hard to get critical mass of community involvement • Rely on curators to enter content? • Curators need to be paid / incentivized • Data entry is boring • A drop in curation/moderation attention can lead to inaccurate, malevolent, or insufficient content • Instead develop a solution that • Takes metadata directly from sources • Adds any resources to TeSS as they appear • Updates any resources that have changed
  • 52. How TeSS works Front End Automated Aggregator Custom Scraper Custom Scraper Custom Scraper Extract metadata from training material and events pages Back End Metadata Catalogue Events Materials Workflows Finds relevant resources Training Workflows Search Interface Workflow Viewer Online Training Resources User enters form data
  • 53. •There are several techniques we can use to extract metadata from content provider websites. This depends on what’s on the site. •Interface with an API • Handy but rare, difficult for websites to implement Content aggregators must write bespoke API Client for each • Structured data already embedded in page (RSS, ICS) • Limited amount of data •HTML Scraping • Fragile technique that can break when there are changes to the website. Automatic extraction techniques
  • 54. Trade-off between ease of adopting and usefulness to aggregators Ease to implement on a website Usefulness to aggregator
  • 55. Content Provider extraction technique statistics Events Materials Total Schema.org / Bioschemas 9 6 15 HTML 3 5 8 XML/JSON/YAML/CS V 4 3 7 iCal 5 -- 5 JSON API -- 2 2 RSS 1 -- 1 Total 38
  • 56. Content aggregation via Bioschemas Front End Automated Aggregator Schema.orgScrape Custom Scraper Custom Scraper Extract metadata from training material and events pages Back End Metadata Catalogue Events Materials Workflows Finds relevant resources Training Workflows Search Interface Workflow Viewer Online Training Resources
  • 58. Technique for adding Bioschemas to a website • 1. Identify an appropriate schema(s) for your content type • 1.a If it doesn’t exist, e-mail the mailing list (W3C, or add to Github Issue tracker) Issue tracker https://github.com/BioSch emas/specifications Mailing List https://www.w3.org/co mmunity/bioschemas/
  • 59. Technique for adding Bioschemas to a website • 2. Draw a table and write down your metadata fields on the left hand side and the schema.org properties on the right. • Map the ones that correlate
  • 60. Technique for adding Bioschemas to a website • 3a. Use the Bioschemas generator to create a JSON-LD snippet that you can (hopefully) copy and paste into your site. (This would mean creating one for every new schema.org record you want to add) http://www.macs.hw.ac.uk/SWeL/BioschemasGenerator/
  • 61. Technique for adding Bioschemas to a website • 3b. If you can modify your site, paste in the JSON-LD template of the schema (from 3a), and render the metadata variables as values to the keys Mapping
  • 62. Technique for adding Bioschemas to a website • 3c. If your site is using a CMS such as Wordpress or Drupal, explore whether there is an appropriate schema.org plugins you can use (or ask on the bioschemas mailing list)
  • 63. Tutorials • Bioschemas Training Portal – There is a step-by-step tutorial on there for adding schema.org to jekyll pages / github page sites. – Hopefully there will be more to come https://bioschemas.gitbook.io/training-portal
  • 64. Tools • Bioschemas Generator – Form-based tool to generate valid Bioschemas JSON-LD – http://www.macs.hw.ac.uk/SWeL/BioschemasGener ator/ • Validata [under construction] – Web application for validating bioschemas markup https://bioschemas.org/software/
  • 65. Tools • GoCrawlt – JSON-LD schema.org extractor • Buzzbang [on hold] – Search engine that crawls the web for Bioschemas JSON-LD https://bioschemas.org/software/
  • 66. Freebies from Schema.org • Google Search Console – Shows you what schema.org data Google is picking up from your site, any errors, and advice on how to fix them – https://search.google.com/search-console
  • 67. Freebies from Schema.org • Google Structured Data Testing Tool – Extracts the schema.org from a given web-page or from a code-snippet, validates it, and shows you what errors there are – https://search.google.com/structured-data/testing- tool
  • 68. Freebies from Schema.org ecosystem • 3rd party plug-ins – Lots available to help add schema.org to your framework
  • 69. Slide courtesy of Alasdair Gray

Notas do Editor

  1. Collection of schemas can be used to describe online objects
  2. Schema.org very lightweight
  3. Going clockwise from top right – we have international organizations, communities surrounding technologies, national institutions, and other academic institutions. All output training events and/or materials and share via their own websites. Many, many opportunities in many, many locations.
  4. 273 Upcoming events – 7540 collected previously.
  5. 831 Training materials