The document discusses Plazi's work on converting biodiversity literature into structured data through text mining and markup. Key points include:
- Plazi extracts scientific names, tables, references and geographic data from literature and converts it into semantically enriched text and RDF.
- Their pipelines currently have over 50,000 taxonomic treatments life and are providing data to databases like NCBI, GBIF and EOL.
- Future plans include collaborating with ContentMine for daily treatment extraction, releasing RDF and text mining versions 1.0, and expanding the biodiversity literature repository to 100,000 references.
20. The «Mehlwurm» lives in dry bread... or the Potential of LOD
has traits
is part of
refers to
has traits
“The larva of the mealworm lives in dry
bread and can be eaten in Switzerland”
29. The Plazi Vision: The Giant Global Biodiversity Graph
Plazi’s intended uses of LOD
• Public discovery by people not otherwise connected to other discovery services such as
Plazi’s own, the GBIF data repository, …. of data about species extracted from the
publications in which they are described
• Public facility for citation of Plazi’s data by arbitrary internet users.
• Plazi creates a new dataset from literature/publications rather than republish existing
data sets
30. The Plazi Vision: The Giant Global Biodiversity Graph
Legal
Social
Technical
Ontologies
Infrastructure
500 M
pages 5*
31. What does this mean?
The Linking Open Data cloud diagram
Linked Open Data Cloud
35. Treatment Graph for the Malagasy Ants Aphaenogaster
Original description
Re-description cites
cites/
synonymizes
Re-description
Re-de.
Re-description
cites
36. Treatment Graph for the Ant Azteca alfari
https://github.com/plazi/TreatmentOntologies
37. Pseudomyrmex ants and Vachellia ant-acacias
are a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzeni
kuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulans
spinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link network
Associated species linked through
references in taxonomic treatments
Acacia-ant species: Pseudomyrmex gracili
Treatment: redescription
Associated ant-acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked
through citations
Treatment opportunities
40. Open Access as Necessity
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).
Online catalogue
Open access
Online library
2004
41. Conversion Workflows: Plazi
Plazi
SRS
find scan «OCR» markup store +
access
Swiss exceptions to copyright law to extract data is an advantage for the sciences
in Switzerland
47. Status quo
• 50,000+ treatments life
• RDF in Betaversion
• GoldenGate Imagine (Text mining tool) in Betaversion
• Provider für Daten für NCBI, GBIF, EOL, antweb
• Biodiversity Literature Repository functional
49. Next steps: CotentMine
Planned collaboration with ContentMine to extract treatments on a
daly bases
http://www.slideshare.net/petermurrayrust/?
BioDiv
52. Next steps
• 1 Million treatments life
• RDF Version 1
• GoldenGate Imagine (Text mining tool) Version 1
• Provider für Daten für NCBI, GBIF, EOL, antweb
• Biodiversity Literature Repository mit 100,000
Bibliographischen Referenzen und digitalen Versionen
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
The Linking Open Data cloud diagram
This web page is the home of the LOD cloud diagram. This image shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations. It is based on metadata collected and curated by contributors to the CKAN directory. Clicking the image will take you to an image map, where each dataset is a hyperlink to its homepage.
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
The Linking Open Data cloud diagram
This web page is the home of the LOD cloud diagram. This image shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations. It is based on metadata collected and curated by contributors to the CKAN directory. Clicking the image will take you to an image map, where each dataset is a hyperlink to its homepage.
The Linking Open Data cloud diagram
This web page is the home of the LOD cloud diagram. This image shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations. It is based on metadata collected and curated by contributors to the CKAN directory. Clicking the image will take you to an image map, where each dataset is a hyperlink to its homepage.
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Who are we?
Who are we?
Who are we?
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server
Notes:
Add in Plazi and the idea of the treatment server