The document discusses the food safety challenge of integrating data from various sources to help researchers and experts address foodborne diseases. It outlines several open data sources providing information on outbreaks, alerts, and statistics, including the SemaGrow SPARQL endpoint, Green Learning Network, Agriculture Bibliographic Network, European Food Safety Authority, Rapid Alert System for Food and Feed, and databases from the CDC and CSPINET. The goal is to help experts better detect outbreaks and draw conclusions by taking advantage of structured and unstructured data from various public and private databases.
3. The issue
• According to WHO foodborne & waterborne
diseases kill an estimated 2.2M people annually,
most of whom are children
• In Europe 1.2M cases of foodborne diseases are
reported annually, that lead to 350,000
hospitalisation and 5,000 deaths
• The foodborne diseases problem is not focused at
the national level but also at international level as
outbreaks, involving multiple countries are becoming
more common with the ever-increasing global
movement of food and people
4. The issue
• The early detection of the outbreaks and the
extraction of conclusions from the analysis of the
alerts has become a major challenge in the area
• Currently, decision makers both at the public and
private sector, food scientists, microbiologists and
epidemiologists that work on food safety topics
cannot take full advantage of all the existing data
for foodborne diseases mainly for two reasons
– part of the information remains unstructured and still
closed in internal databases and
– the information is stored in custom and non-standard
schemas and thus it is not shared globally in an
interoperable way.
5. The challenge
How to help
• researchers
• food scientists
• microbiologists
• epidimiologists
that work on food safety topics to take full
advantage of all the existing data for
foodborne diseases, discover & access the
resources they need?
8. SemaGrow SPARQL Endpoint API (1/2)
• SPARQL endpoint: http://143.233.226.42:8080/SemaGrow
• Included datasets in the SemaGrow SPARQL Endpoint:
– AGRIS
– VOA3R
– Organic Edunet
– ARIADNE
– Europeana
– Natural Europe
– Trees for future
• User guide of the annotation tool can be found at
http://wiki.agroknow.gr/agroknow/images/a/a8/Eleon_user_guide.pdf
• How to install a linux package distribution of the SemaGrow
Stack? http://semagrow.semantic-web.at/docs/semagrow-stack-
assembly/1.0.0/installation.html#Debian
9. SemaGrow SPARQL endpoint API (2/2)
• SemaGrow provides access to numerous Linked Data
sources: AGRIS, Ariadne, Europeana, Organic Edunet,
Natural Europe, IFPRI, RASFF and AGROVOC.
• Developers can access the SemaGrow interface online
(http://143.233.226.42:8080/SemaGrow/) or query it via a
standard HTTP GET method.
• For example, to get results from this basic query:
SELECT * WHERE {
?s ?p ?o
} LIMIT 20
one would have to apply HTTP GET to this URL:
http://143.233.226.42:8080/SemaGrow/sparql?output=json&quer
y=SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10
11. GLN Search API (agINFRA-powered)
• Green Learning Network (GLN) is a data pool of high
quality educational resources related to agriculture
and biodiversity:
– big variety of collections
– dataset is organized per collection
– access the GLN data pool API
http://api.greenlearningnetwork.com/documentation/
– focus on the akif data type
http://api.greenlearningnetwork.com/search-api/v1/akif?q=*
– REST-based queries over harmonised information (result
of metadata processing)
12. ABN Search API (agINFRA-powered)
• Agriculture Bibliographic Network (ABN) API
currently provides access to a small subset of AGRIS
data pool that totally contains million bibliographic
references on agricultural research and technology:
– access the AGRIS API
http://api.greenlearningnetwork.com/search-api/
– focus on the agrif data type
http://api.greenlearningnetwork.com/search-api/v1/agrif?q=*
– REST-based queries over aggregated metadata
13. Search options
• Simple search
– http://domain/search-api/v1/akif/?q=tomato
• Searching within specific fields
– http://BASE_URL/search-
api/v1/akif/?languageBlocks.en.description=tomato
• Temporal
– http://BASE_URL/search-api/v1/akif/?creationDate=2013-04-16
• Fetching specific item
– http://BASE_URL/search-api/v1/akif/COLLECTION/20296
15. The Food Safety Data
• Available in various types, formats and sources
– Application programming interface (API)
– Dump files (e.g. XML)
– SPARQL endpoints
– Harvesting from services (OAI-PMH)
– HTML / data scraping
– Crawling
– …combination of the above
• Documentation:
– http://dev.socrata.com/docs/formats/rdf-xml.html
16. European Food Safety Authority
• URL: http://www.efsa.europa.eu
• Data types: Publications, reports, classification and
monitoring data
17. Rapid Alert System for Food & Feed
System
• URL: http://ec.europa.eu/food/safety/rasff/index_en.htm
• Data portal: https://webgate.ec.europa.eu/rasff-window/portal/
– and
https://webgate.ec.europa.eu/rasff-
window/portal/?event=SearchForm&cleanSearch=1
• Data types: historical data about food alerts, recalls and
measures taken by the national authorities
18. EC Health & Consumers’ Open Data
• A knowledge base of the European Commission
Health and Consumers Directorate General public
data
• SPARQL endpoint:
http://ec.europa.eu/semantic_webgate/query
19. Center of Science in the Public Interest
• URL: http://www.cspinet.org
• Database:
http://www.cspinet.org/foodsafety/outbreak/pathogen.php
• Data types: Reports from foodborne illness
outbreaks in US since 1997
• Database of outbreaks reports:
http://cspinet.org/foodsafety/outbreak_report.html
20. Centers for Disease Control and
Prevention
• URL: http://www.cdc.gov
• Database on foodborne illnesses:
http://www.cdc.gov/foodborneburden/estimates-overview.html
• RSS feed of combined Food Safety information:
http://www2c.cdc.gov/podcasts/createrss.asp?c=146
• Data types: food-borne outbreaks reports
21. International Food Policy Research
Institute (IFPRI)
• URL: http://www.ifpri.org
• Database: http://data.ifpri.org
• SPARQL endpoint: http://data.ifpri.org/sparql
• Data types: statistical data, reports, maps
23. Social media / search engines
• Twitter: info related to to the identification of
foodborne disease incidents reported directly by
food consumers
– http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785982
– Twitter API: https://dev.twitter.com
• Search engines: Data about search queries related to
food safety incidents from search engines like Google
and Yahoo
– http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3785982