The Crop Ontology is a controlled vocabulary for plant breeding data that aims to standardize terminology across crops and languages. It provides definitions and relationships between trait terms to improve data sharing and analysis. The ontology is being developed collaboratively by crop experts and is mapped to other related ontologies. It is powering various crop databases and tools to allow interoperability of phenotypic and genetic data.
The Crop Ontology: a resource for enabling access to breeders’ data
1. http://www.cropontology.org
The Crop Ontology
a resource for enabling access to
breeders’ data
Elizabeth Arnaud1*, Luca Matteis1, Marie Angelique Laporte1, Herlin Espinosa2, Glenn Hyman2, Rosemary Shrestha3, Arlett
Portugal4, Pierre Yves Chibon5, Medha Devare6, Akinnola Akintunde7, Jeffrey W. White8, Mark Wilkinson9, Caterina Caracciolo10,
Fabrizio Celli10, Graham McLaren4
1Bioversity
International, France, 2International Center for Tropical Agriculture (CIAT), Colombia, 3Genetic Resources Program (GRP), Centro
Internacional de Mejoramiento de Maíz y Trigo (CIMMYT), Mexico, 4Generation Challenge Programme (GCP) c/o CIMMYT, 5 UR Plant Breeding, Univ.
of Wageningen, The Netherlands, 6 International Maize and Wheat Improvement Center - South Asia Regional Office (CIMMYT-SARO), NepaL,
7International Black Sea University (IBSU) Georgia, 9 Centro de Biotecnología y Genómica de Plantas UPM-INIA, Spain, 10Food and Agriculture
Organization (FAO) of the United Nations, Office for Partnership, Italy
Generation Challenge Programme Workshop, 13th January 2014
In Plant and Animal Genomics Conference, San Diego, USA, 11-15th January 2014
4. The Knowledge domain:
plant breeding
Understanding the relationships between plant
genotype and environment, develop the
adaptive traits to respond to biotic and abiotic
stress, promote the adequate agronomic
practices to cultivate it and understand the
heritability of adaptive traits
5. Dimensions of a phenotype
Environmental
Conditions
Cultural
Socio Economic
Light
Agronomic
Developmental
Water
Nutrients
Temperature
Physiologica
l
Chemical
Molecular
Soil
Understanding the GxE
interaction and the
heritability of adaptive
traits
Time
6. High Throughput Data Generation
needs standardized trait concepts
• Next Generation Sequencing (NGS) platforms for
detailed analysis of largest plant genomes
• Phenotyping platforms measure a wide range of
structural and functional plant traits at the same time as
collecting meticulous metadata on the environment and
experimental setup [Fiorani and Schurr, 2013]
•GWAS typically focus on associations between a
single-nucleotide polymorphisms (SNPs) and traits.
8. •
Harmonization and access to data
‘Fruit colour‘
Breeders’ data are often
•
unstructured data - Complex
free text used for phenotypes
description
No semantic coherence :
Bean pod color
•
•
•
Same trait given different
names by scientists
One trait named the same way Rice grain or
for various species but refers to caryopsis colour
different plant structures
Data and metadata are NOT
interoperable and often not
online
Maize Kernel
Colour
9. Integrated Breeding Platform
www.integratedbreeding.net
•
one-stop shop for services to design and carry out
breeding projects – Integrated breeding workflow
•
Breeders’s databases share a common schema and are
being published online
•
IB Fielbook is available with a standard list of traits per
crop
10. Phenotype
It is a composite of an entity (e.g. fruit) and an
attribute (e.g. shape) with a value (e.g. round):
Entity + Attribute = Trait
Entity + (Attribute + Value) = Phenotype
(observed)
fruit + (shape + round) = fruit shape round
-> round fruit is the phenotype
11. A range of controlled vocabularies
Web 2.0
From the controlled vocabularies build valid semantic ontologies consumabke
by Web 2.0
Best practices
12. Crop Ontology
• Crop Ontology is primarily an application
Ontology for fielbooks
• A visualization tool supporting communitybased development tool of trait
dictionaries and crop specific ontologies
• Compare and validate terms in common
Rosemary Shretha, CIMMYT
CO coordinator until 2012,
13. Community based development
process
•
•
•
•
•
Domain experts (breeders, pathologists, agronomists, etc) and
Data managers identify the list of concepts
For an variety evaluation project, Data Managers and
breeders produce the IBfieldbook template with the traits and
submit new terms
Crop ontology curators in the Crop Lead centers curate,
validate, compile the list and upload on the site
The Global Crop Ontology Curator curates the crop ontology
with the Crop Lead Centers’ curators
Web development expert maintains the site
14. Crop curators and associated scientists
Crop
Crop Lead Center
Curator
Scientists
Barley
Cassava
ICARDA, Tunisia & Marocco
International Institute of Tropical Agriculture
(IITA), Nigeria
ICRISAT-Patancheru Andhra Pradesh, India
Fawzy Nawar
Bakare Moshood –replaced by
Afolabi Agbona
Prasad Peteti
Ramesh Verma
Peter Kulakow
Guerrero Alberto Fabio
Steve Beebe; Rowland Chirwa
Sam Ofodile
Ousmane Boukar
Fawsy Nawar
Rosemary Shrestha
Shiv Kumar Agrawal
Rhiannon Chrichton
Inge Van den Bergh
Praveen Reddy
Praveen Reddy
Reinhard Simon
Frances Nikki Borja
Until 2013
Praveen Reddy
Ibrahima Sissokho
Tom C. Hash
Isabel Vales
Sorghum
International Center for Tropical Agriculture
(CIAT), Colombia
International Institute of Tropical
Agriculture(IITA), Nigeria
ICARDA, Tunisia, Marrocco
International Maize and Wheat Improvement
Center (CIMMYT) Mexico
Bioversity International
Montpellier, France
ICRISAT-Andhra Pradesh, India
id
International Center for Potato (CIP), Perou
International Rice Research Institute (IRRI),
Philippines
ICRISAT-India and Mali
Wheat
Yam
Global
CIRAD
CIMMYT (see above)
IITA, Nigeria
Bioversity International, Montpellier
Chickpea
Groundnut
Common beans
Cowpea
Lentil
Maize
Musa
Pearl millet
Pigeon pea
Potato
Rice
Rosemary Shrestha
Afolabi Agbona
Harold Durufle
Trushar Shah
Mauleon Ramil;
Ruaraidh Sackville Hamilton
Trushar Shah
Eva Weltzien-Rattunde,
Taba Nebe
Jean Francois Rami
Antonio Jose Lopes Montez
15. Crop Ontology themes
General germplasm information
Phenotype and traits
Plant anatomy and development
Location and environment
Trial management and experimental
design
Structural and functional genomics
20. Crop Trait Dictionary Template
simple to share with breeders
Name of submitting
scientist
Institution
Language of submission
Date of submission
Bibliographic Reference
Comments
n
Method ID
Name of Method
Describe how measured (method)
Growth Stage
Field, greenhouse
1
1
Crop Name
Name of Trait
Abbreviated name
Synonyms (separate by commas)
Trait ID for modification, Blank for New
Description of Trait
How is this trait routinely used?
Trait Class
n
Scale ID
Type of Measure (Continuous, Discrete or
Categorical)
For Continuous: units of measurement,
reporting units, minimum. maximum
For Discrete: Name of scale or
units of measurement
For Categorical: Name of rating scale, Class #
value = meaning
22. Methods & Scales for annotations
• Precomposed relationships between Trait, Methods and
Scales required for annotations in phenotype databases
• On going discussion for revising the structure and get the 3
separated in 3 namespaces
23. Methods & scales for the
standard lists of the Breeders’ fieldbook
Visualization & download
In Crop database and
Fieldbook template
24. Easy to use the site - Partners published
their Trait ontologies
Soybean
Solanaceae
France
Grape
Barley
26. Experimental design ontology
Trial management tasks
•
CROP - PLANTING
•
SEED TREATMENT
•
IRRIGATION
•
FERTILIZER
•
PESTICIDE
•
SOIL
•
BIOTIC STRESS
•
ABIOTIC STRESS
•
HARVEST-YIELD
Medha Devare
CSISA-Nepal Coordinator, CIMMYT –SARO
Design of the Fieldbook and coordination
Akinnola Akintunde,
International Black Sea Univ. (IBSU), Georgia
Development of the ontology and fieldbook
27. Dictionary for Trial Management
Concepts
From Medha Devare, CSISA-Nepal Coordinator
CIMMYT -SARO
28. Environmental Ontology
Jeffrey W. White
Research Plant Physiologist & Research Leader
Arid-Land Agricultural Research Center
USDA-ARS, Arizona, USA
Sheryl Porter
Coordinator, Computer Research Applications
University of Florida, Gainesville, FL, USA
30. Environmental Ontology
• Improve the current list of concepts
•International Consortium for Agricultural
System Applications (ICASA)
• Integration of a Master list of 600 variables
for describing crop management and
recording plant responses.
• ICASA promotes the use of standards in
relation to crop field research and for
ecophysiological models.
• One objective is the application of ICASA
variables by the Agricultural Model
Intercomparison and Improvement Project
(AgMIP) (http://www.agmip.org/ ).
32. Synchronization of Crop Ontology
with Integrated Breeding Workflow
Graham Mc Laren,
Generation Challenge
Programme
Rebecca Berrigan,
Efficio Technology
Service
Arllet Portugal
IBP Data Management Leader
Luca Matteis, CO Web Site
developer, Bioversity
International
Harold Durufle, CO curator,
Bioversity International
33. Application Programming Interface
(API)
• Developed by Luca Matteis
• Provide access services to 3rd party web sites or software
• Support open collaboration and use of the Crop Ontology
34. Local Databases
Breeders & Data Managers
Breeders’
Trait Dictionaries
Crop Database
Data Manager
Curation of the Crop
Ontology
Fieldbook Template
Data Annotation
& new terms addition
Cross referencing terms with Plant Ontology &Trait Ontology
Submission of new traits through the term tracker
35. IBWS - Key elements of the Logical
Data Model to store phenotypic data
36. Annotation for storing phenotypic data in
the IBWS
Property (Trait)- CO_ID
Requires
Method - CO_ID
3 namespaces
Scale – CO_ID
continuous
discrete
categorical
Class1-value – CO_ID
Class2-value – CO_ID
Class3-value – CO_ID
A unique combination of IDs for P+M+S+C
= A Standard Variable
Is_a_valid_value_of
Data
Controlled
vocabulary
Term ID
41. API access by
rd
3
Party Web sites
IBP Crop Databases
IB Fieldbook
Genotype Data MS
[Text]
API
Phenomics Ontology
Driven DB (PODD)
EU-SOL
Solanaceae Breeding DB
Wageningen.
[Text]
International cassava DB
Agtrials -CCAFS
42. Global Agricultural Trial Repository
and database
www.agtrials.org
Glenn Hyman, geographer, CIAT
Herlin R. Espinosa G. , web developper, CIAT
Luca Matteis, Web developer, Bioversity International
43. Global Agricultural Trial Repository
http://www.agtrials.org/
• To store
evaluation data
files described
with metadata
• To produce an
Atlas of the
trials
1,029 trials for
Cassava
46. 3. Display the Trial Information
Access to the definition
of the Trait in
the Crop Ontology
47. Integration of Crop Ontology in IBP
Fred Okono, IBP Project Administrator
Brandon Tooke, IBP web developer
Luca Matteis, CO Web developer, Bioversity
International
49. CO Semantic Web Compliance
Marie Angelique Laporte, Ontology
development, RDF & SKOS conversion,
Bioversity International
Luca Matteis, CO Web developer,
Bioversity International
Mark Wilkinson, Centro de Biotecnología y
Genómica de Plantas UPM-INIA, Spain
50. Linked Open Data Cloud
• A term used to describe a recommended best practice for exposing,
sharing, and connecting pieces of data, information and knowledge
• It builds upon standard Web technologies such as HTTP, RDF and
URIs
• Rather than using them to serve web pages for human readers, it
extends them to share information in a way that can be read
automatically by computers.
Wikipedia
• This enables data from different sources to be connected and
queried.
51. Crop Ontology in the Linked Open Data
recommended format
•
Conversion from OBO to RDF/SKOS
resolvable HTTP URIs
•
A conversion into Simple Knowledge Organization
System (SKOS) is going on
<http://www.cropontology.org/rdf/CO_324:0000002>
a
skos:Concept ;
rdfs:label "Flag leaf weight"@en ;
dc:creator _:b1 ;
skos:definition "Weight of the flag leaf (the one just below the
panicle)." ;
skos:inScheme co:sorghum ;
skosxl:prefLabel
[a
skosxl:Label ;
co:acronym
[a
skosxl:Label ;
skosxl:literalForm "FLGWT"
];
skosxl:literalForm "Flag leaf weight"@en
].
52. Linked Open Data publishing and
Aligning Crop Ontology with
AGROVOC
Caterina Caracciolo,
Food and Agriculture
Organization (FAO),
AIMES, Italy
Fabrizzio Celli,
Food and Agriculture
Organization (FAO),
AIMES, Italy
Marie Angelique Laporte,
Bioversity International
Luca MatteisBioversity
International
53. Agrovoc - Agricultural Thesaurus
•
32,000 concepts organized in a hierarchy
•
each concept may have labels in up to 22 languages
•
is now available as a linked data set published,
aligned (linked) with several vocabularies
54. Release of Agris 2.0
agris.fao.org
• AGRIS bibliographic records contain rich metadata and are largely
indexed by AGROVOC FAO’s multilingual thesaurus
55. AGRIS 2.0 and Phenotypic Data
• AGRIS 2.0 uses the Linked Open Data Methodology to link
various source of data in the mash up site
• Proof of concept done with the Collecting mission database of
Bioversity International
• 3 steps
1.
The AGRIS datasets were converted to RDF creating some 200
million triples. AGROVOC was aligned to other thesauri.
2.
Sparql endpoints, web services and APIs were discovered.
3.
AGRIS RDF was interlinked – using AGROVOC LOD as a backbone
– to external datasets.
• Align Crop Ontology with AGROVOC in SKOS/RDF
• Promote the publishing of Phenotypic data into RDF
• Objective : Retrieve bibliographic references and data from
phenotypic databases in the mash up site
56. Partners collaborating to the informatics
and integration formats
• IBFieldbook and IBWS teams and Efficio LLC
• Plant Breeding dept. of Wageningen for the
Resource Description Format (RDF)
• CIAT-DAPA, for the synchronization of The Global
Repository of Evaluation trials (Agtrials) of CCAFS
• FAO-AIMES for the use of Linked Open data with
AGRIS 2.0
57. Partners collaborating to the content
engineering & the looking forward to a
Reference Ontology for plants
•
Plant Ontology, Jaiswal Lab., Oregon State University,
USA
•
Soybase, USDA-ARS, USA
•
Solanaceae Genomic Network (SGN), USA
•
Cornell University, USA
•
Institut National de Recherche d’Agronomie (INRA),
France
•
Centro de Biotecnología y Genómica de Plantas UPMINIA, Spain
•
POLAPGEN, Poland
•
Australian Plant Phenomics Data Repository
58. Any questions, please contact us
Send a mail at :
e.arnaud@cgiar.org
h.durufle@cgiar.org
l.matteis@cgiar.org
helpdesk@cropontology-curationtool.org
Poster #981
Plant Genomics Outreach Booth # 305
Notas do Editor
These elements are sufficient for managing phenotyping data from any field experiment, however a sixth component is required to facilitate integration of phenotyping data across studies. This is the Ontology Management System (OMS) which identifies comparable elements – labels, variates and values across studies.
Precomposition for annoattion
Turtle (Terse RDF Triple Language) is a format for expressing data in the Resource Description Framework (RDF) data model, similar to SPARQL.RDF represents information using triples, each of which consist of a subject, predicate and an object. Each of those items is expressed as a web URI