Biodiversity Informatics on the Semantic Web

Biodiversity Informatics on the
Semantic Web

Pete DeVries
TaxonConcept.org
http://www.taxonconcept.org/
Department of Entomology
University of Wisconsin - Madison

What is the Semantic Web and how
does it Work?
Lets Look at the Traditional Way
Taxon Table

Location Table

This data structure is really only interpretable within the context of this speciﬁc database

Data Islands

The result are database islands that contain a lot of redundant data which is independently curated.

Each effort beneﬁts little from the other efforts.

Data Sets often Overlap

Text

What they don’t have is a common set of ﬁeld names or ID’s

Each Data set has is own “Vocabulary”

Different Fields
Different Names for the Same Fields
Same Names for Different Fields
Different ways of Interpreting those Fields

These nuances in meaning are often only understood by the
designers of each individual data set.

Consider how differently people interpret the meaning of
different ﬁelds in the various email discussions.

Where the Semantic Web Helps
Tim Berners-Lee’s 4 Rules

1. Use URIs* as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs. so that they can discover more things.

*URI = Uniform Resource Identiﬁer
http://www.w3.org/DesignIssues/LinkedData.html

Use URIs as Names for Things?

Instead of “Door County” use
http://sws.geonames.org/5250768/

For Humans this URI Dereferences To

For Machines this Dereferences To

Why Would Anyone Think this Made Sense?

Now, each of these different databases are using an ID with a shared meaning.

A meaning that can be determined by dereferencing the URI.

All the data sets that use this vocabulary are now connectable.

All the data sets that are linked to this URI are now also linked to each other.

As More Data Sets Adopt these Principles

The individual datasets are no longer islands, but are one interconnected knowledge base

Other Benefits

Reduced duplication of effort and a better separation of concerns

It would be more efficient for me to simply link to a bibliographic
reference URI on a site that specializes in that then to create my own
bibliographic database.

Similarly, it would be more efficient for the bibliographic database to link
to a URI in a nomenclatural database than curates that aspect separately.

What is Linked Open Data?

Linked Open Data (LOD) and the LOD Cloud are linked open accessible data sets
A diagram of the subset of Linked Open Data that is described at http://ckan.net/

Wikipedia Images linked to my Species Concepts

TaxonConcept <=> Dbpedia <=> WikiCommons Images
Virtuoso OpenSource and Microsoft Pivot
(some images are too large to display)

How do I Mark up my Data?

Your data set can continue to exist in its current relational
database form, but you need to expose it to the semantic web in a
different form

Knowledge as Triples
Statements are represented in a triple structure

Subject ➜ Predicate ➜ Object

• An English text version of a triple might look like

• Ochlerotatus triseriatus expected in La Crosse County, WI

Machine Processable Version
Ochlerotatus triseriatus is expected in La Crosse County, WI

Now represented as the following triple*

http://lod.taxonconcept.org/ses/iuCXz#Species

http://lod.taxonconcept.org/ontology/txn.owl#isExpectedIn

http://sws.geonames.org/5258961/

*Not Meant for Human Consumption

Expressing RDF

RDF = Resource Description Framework

Ways to Express RDF (Serialization Formats)

RDF/XML
http://www.w3.org/TR/REC-rdf-syntax/
Notation 3 (N3)
http://www.w3.org/DesignIssues/Notation3.html

Subsets of N3
Turtle (Terse RDF Triple Language)
N-Triples

The Same Triple in Different Formats
RDF/XML (.rdf)

N3 (.n3)

Turtle (.ttl)

You might ﬁnd one of these forms easier to create.
There are various tools that will allow you to convert between one form and another.
If you need RDF/XML, but can create N3; author in N3 then convert those ﬁles to RDF/XML.

How do I tell the Semantic Web
about my Data?

PingtheSemanticWeb
http://pingthesemanticweb.com/
Semantic Sitemaps
http://sw.deri.org/2007/07/sitemapextension/

PingtheSemanticWeb.com
Enter the URL for your RDF documents

Semantic SiteMaps

http://site.example.com/sitemap.xml
http://site.example.com/sitemap.xml.gz
Refer to the sitemap.xml ﬁle in your sites robots.txt ﬁle

How can I Find other Potentially Useful
Data Sets?
CKAN Comprehensive Knowledge Archive Network
http://ckan.net/

Ask the LOD Cloud

Enter in term or name like “Quercus alba”, to see what entities contain that term or name

How can I set up my own Knowledge Base?
Virtuoso Open-Source Edition
http://virtuoso.openlinksw.com/

How can I Query a Knowledge Base?
SPARQL
http://en.wikipedia.org/wiki/SPARQL
http://www.w3.org/TR/rdf-sparql-query/

Query using the Web Interface
Query using your own script or web application

Example

“Describe those occurrences of the species concept Boloria selene”

iSPARQL Query Example Web Interface

What does the Future old for the
Semantic Web and Linked Open Data

Improvements in the quantity and quality of LOD data sets.
Improved Alignment of Vocabularies
Improvements in SPARQL and Quadstores
Human and Machine Interpretable Views Merged in RDFa
Better Visualization and Analysis Tools

One More Thing!
Now that many people have smartphones that can scan a barcode and load a speciﬁc web page,
consider using URI’s to your web accessible database for things like collection drawers, specimen
and species pages.

QR Codes are one form of 2D barcode that seem to work well.
http://en.wikipedia.org/wiki/QR_Code
QuckMark seems to make an inexpensive reader for many smart phones.
http://www.quickmark.com.tw

=>

Other Resources
Linked Open Data http://linkeddata.org/
W3C.org http://esw.w3.org/Main_Page
public-lod email list http://lists.w3.org/Archives/Public/public-lod/
TaxonConcept.org http://www.taxonconcept.org/

Acknowledgments
Kingsley Idehen http://www.openlinksw.com/blog/~kidehen/
David “Paddy” Patterson eol.org
Dmitry Mozzherin eol.org

Biodiversity Informatics on the Semantic Web

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Biodiversity Informatics on the Semantic Web

Semelhante a Biodiversity Informatics on the Semantic Web (20)

Biodiversity Informatics on the Semantic Web

Notas do Editor