The document discusses building a European data infrastructure for agricultural research information. It proposes connecting heterogeneous agricultural data sources to allow for unified querying. Semantic web technologies like linked open data would allow different communities to access the same data using their own vocabularies and ontologies. Challenges include querying very large distributed datasets and developing scalable semantic indexing. Potential collaborations are mentioned between the presenter's company, Agro-Know, and the Chinese Academy of Agricultural Sciences to share agricultural knowledge and research.
Agro-Know & the European agricultural research information ecosystem
1. Agro-Know & the European
agricultural research
information ecosystem
Nikos Manouselis (PhD)
CEO Agro-Know
www.agroknow.gr
2. ToC
• about me & Agro-Know
• our context of work
• building a European data e-infrastructure for
agricultural research
• collaboration between CAAS AII & Agro-Know
7. Κρήτη (Crete)
• Crete is the largest and most populous of the
Greek islands
• It forms a significant part of the economy and
cultural heritage of Greece while retaining its
own local cultural traits (such as its own poetry,
and music)
• Crete was once the center of the Minoan
civilization (circa 2700–1420 BC), which is
currently regarded as the earliest recorded
civilization in Europe
8. Minoan civilisation
• Named
after King
Minos
• A king of
Crete, son
of Zeus and
Europa
9.
10. Minoans: enemies with Athens
• Every nine years, King Minos
of Crete made King Aegeus of
Athens to pick seven young
boys and seven young girls to
be sent to his palace, the
labyrinth, to be eaten by the
monster Minotaur (half man,
half bull)
17. http://www.agroknow.gr
An extraordinary company that captures, organizes
and adds value to the rich information available in
agricultural and biodiversity sciences, in order to
make it universally accessible, useful and meaningful.
18. We develop and put in real
practice solutions that transform
data into meaningful knowledge
and services
We help people
solve problems
informed by data
19. data aggregation & sharing solutions
Cultivation Harvesting Blossom
Unorganized Content in
local and remote sites
Data Discovery Services
Widgets
Authoring services
Analytics services
Ingestion Translation Publication
Data Framework
Educational
Bibliographic
Other
Organized and structured
Content in local and remote
DBs
Enrichment
Aggregate
data from
diverse
sources
Works with
different type
of data
Prepare data
for
meaningful
services
Educational
Bibliographic
20. working with high profile partners & clients
• Food and Agriculture Organization (FAO) of the
United Nations
• World Bank Group
• UK’s Dept for International Development (DFID)
• Michigan State University (MSU)
• Wageningen University & Research (WUR)
• French Institute of Agricultural Research (INRA)
• Creative Commons
22. CIARD
• “towards a Knowledge Commons on
Agricultural Research for Development”
• “agricultural knowledge is freely accessible
and contributes to reducing hunger and
poverty”
• “open knowledge makes it easier to provide
better solutions”
http://www.ciard.net/about/manifesto
23. Open Knowledge Convening (February 2013)
• Open Knowledge for Agricultural
Development Convening, hosted by MSU in
February 2013
24. launch of RDA (March 2013)
• joint USA, EU, Australia Research Data Alliance
– “researchers and innovators openly sharing data
across technologies, disciplines, and countries to
address the grand challenges of society”
• Interest Group on Agricultural Data Interoperability
– Wheat Data Interoperability Working Group
– Germplasm Data Interoperability Working Group
– …more
https://rd-alliance.org
25. G8 conference (April 2013)
“How Open Data can be
harnessed to help meet
the challenge of
sustainably feeding nine
billion people by 2050”
26. GODAN initiative
• “support global efforts to make agricultural and
nutritionally relevant data available, accessible, and
usable for unrestricted use worldwide”
• “advocate for the release and re-usability of data in
support of Innovation and Economic Growth,
Improved Service Delivery and Effective Governance,
and Improved Environmental and Social Outcomes”
http://godan.info/statement.html
28. agricultural research
• Agricultural research can be broadly defined as any
research activity aimed at improving productivity
and quality of crops
– by genetic improvement, better plant protection , irrigation,
storage methods, farm mechanization , efficient marketing,
better management of resources, human development
[Loebenstein & Thottappilly, 2007]
29. agricultural research information
• Primary data:
– Structured, e.g. datasets as tables
– Digitized : images, videos, etc.
• Secondary data (elaborations, e.g. a dendogram)
• Provenance information, incl. authors, their organizations
and projects
• Methods and procedures followed
• Reports, including papers
• Secondary documents, e.g. training resources
• Metadata about the above
• Social data, tags, ratings, etc.
30. there is a lot of data
…but where do I start searching?
31. simple goal of agINFRA
• demonstrate how we can make information on
European agricultural research
– more discoverable
– better linked
– interoperable & exchangeable
• focus on selected types of information (primarily
bibliographic information, educational resources;
also germplasm data, soil maps, …)
• collaboration cases with international partners
(such as CAAS)
32. agIFNRA e-infrastructure
Registry of
Datasets and APIs
Cloud / SaaS tools
Omeka, AgriDrupal,
AgriOceanDSpace
Productivity Tools
Registry of
vocabularies
and tools VEST
registry
LOD Vocabularies
AGROVOC
Local KOSs
Controlled lists
- Document types
- Data types
- File formats (IANA +)
- Protocols
- Audiences
- Licenses
etc.
agINFRA RDF
vocabularies
agINFRA LOD KOSs
Bibliographic
Educational
Germplasm
Soil
Datasets
APIs
etc.
Including:
agINFRA collections
agINFRA data sources
agINFRA APIs
Information services
Grid jobs
Grid workflowss
agKEA, ag@RDF,
agHarvest…
Public REST APIs
agHarvest,
agTransform,
agTagger
VocBench
Shared
URIs
Call APIs
33. actors over the infrastructure
collections
Data providers
Information
systems
providers
Researchers
Taxonomists
Registry of
Datasets and APIs
Cloud / SaaS tools
Productivity Tools
Registry of
vocabularies
and tools
LOD Vocabularies
agINFRA RDF
vocabularies
agINFRA LOD KOSs
data sources
APIs
Information services
Grid jobs
Grid workflowss
Public REST APIs
Policy makers
Developers
35. moving forward
OAI-PMH Service
Provider #1
Schema #1
OAI-PMH Service
OAI-PMH Service
Provider #n
Provider #1
Schema #n
HARVESTER
Schema #1
OAI-PMH Service
Provider #n
Schema #n
HARVESTER
AGRIS AP Schema
IEEE LOM Schema
INDEXER
Aggregated
XML Repository
AGRIS AP Schema
IEEE LOM Schema
Web Portals
Open AGRIS (FAO)
AgLR/GLN (ARIADNE)
Organic.Edunet (UAH)
VOA3R (UAH)
...
DC Schema
...
SPARQL endpoint
(Data Source #1)
SPARQL endpoint
SPARQL endpoint
(Data Source #n)
Common Schema
RDF Triple Store
INDEXER
INDEXER
(Data Source #1)
SPARQL endpoint
Web Portals
Aggregated
XML Repository
DC Schema
Web Portals
...
Open AGRIS (FAO)
AgLR/GLN (ARIADNE)
Organic.Edunet (UAH)
VOA3R (UAH)
...
Common Schema
RDF Triple Store
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
SPARQL endpoint
(Data Source #n)
INDEXER
Web Portals
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
36. problem when scaling up
• enable the seamless federation of:
– large, live, constantly updated datasets and
streams
– heterogeneous data
• involve data publishers that
– cannot or will not join a tight, centrally
controlled distributed database
– cannot or will not directly and immediately
make the transition to new vocabularies
37. the SemaGrow solution
• a SPARQL endpoint that federates several
heterogeneous data sources
– client poses a query in their preferred schema
• no need to know where to ask for what
• no need to know the source’s schema
– by means of collecting and indexing meta-information
about the data stored in each data source
• in this manner the data sources do not need to be
cloned and re-hashed, and the way data is
distributed among them does not need to be
centrally controlled
38. what Semantic Web can bring into the picture
• One Data Access Point for the entire Data Cloud
–Enabling Service-Data level agreements with Data providers
• Application-level Vocabularies / Thesauri / Ontologies
–Enabling different application facets for different communities of users over the same data pool
Query
Federated endpoint Wrapper
SemaGrow
SPARQL endpoint
Resource Discovery
Query
results
query fragment,
Source
(#1)
Instance Statistics
Set of
query
patterns
Data Summaries
SPARQL endpoint
POWDER
Inference Layer
P-Store
Instance
Statistics
query fragment,
target Source
transformed query
Query Decomposition
query
patterns
query fragment,
Source
(#n)
Query Results Merger
query
results
Client
Reactivity
parameters
Query Decomposer
Data Source(s) Selector
Ctrl
Candidate Source(s) List
· Instance Statistics
· Load Info
· Semantic Proximity
Query Transformation
Service
equivalent
patterns
Schema
Mappings
SPARQL endpoint
(Data Source #n)
SPARQL
query
Ctrl
Ctrl
Load Info
Instance Statistics
Data Summaries
Query Pattern Discovery
Service
query
pattern
Semantic
Proximity
Resource Selector
query results schema
transformed schema
query
request #1
query
request #n
query
results
SPARQL endpoint
(Data Source #1)
SPARQL
query
Query Manager
• Going beyond existing
Distributed Triple Store
Implementations
–Link Heterogeneous but Semantically
Connected Data
–Index Extremely Large Information Volumes
(Peta Sizes)
–Improve Information Retrieval response
• Data (+Metadata)
physically stored in Data
Provider
– No need for harvesting
• Vocabularies / Thesauri /
Ontologies of Data Provider
choice
– No need for aligning
according to common
schemas
39. research challenges
• develop novel methods for querying
distributed triple stores
– that can overcome the problems stemming from
heterogeneity and the undetermined
distribution of data over nodes
• develop scalable and robust semantic
indexing algorithms
– that can serve detailed and accurate data source
annotations (metadata) about extremely large
datasets
41. similar/relevant efforts
• PubAg: forthcoming service by National
Agricultural Library (NAL) for discovering USDA
publications – and beyond
• LGU community of ag knowledge: forthcoming
service federating institutional repositories of
Land Grant Universities in the US
• CGIAR open: (to be) federating & providing access
to publications and data from all CG center
repositories
• …and maybe more to come