Creating an integrated Ondex knowledge base for comparative gene function analysis
1. Creating an integrated Ondex knowledge base for comparative gene function analysis Keywan Hassani-Pak UK Plant Systems Biology Workshop 2011 keywan.hassani-pak@bbsrc.ac.uk
2. 2% of fully sequenced genomes are plant genomes Sequenced Plant and Crop Genomes phytozome.net Arabidopsis Rice ? Functional knowledge
3. Functional knowledge in plant and crop databases Source: Mochida et.al. “Genomics and bioinformatics resources for crop improvement.” (2010)
4. Ontologies – Bridging the gap across species and databases Major ontologies used in plant databases Gene Ontology (GO) Plant Ontology (PO) Gramene Trait Ontology (TO) Crop Ontology (CO) Functional annotations use terms from ontologies Annotations must haveevidence types, e.g. IMP, IEA Metrics exist to computesemantic similarity of terms
6. A knowledge base of functional annotations in plants October 2010 Collected data from TAIR and Gramene Used Ondex to create a knowledgebase of functional gene annotations in plants Measured conservation and transferability of ontology terms
7. Conservation of Plant Ontology terms between Arabidopsis and rice Conserved ? NotConserved A function is conserved if genes from two species are annotated with the same term and are orthologous Conservation of ontology terms can be used as background knowledge for cross-species annotation transfer Conserved terms suited for automated annotation transfer Not conserved terms need further investigation Defoin-Platel M, Hassani-Pak K and Rawlings CJ (2011)
12. Summary Study of gene function needs integration of multiple sources of information Cross-species knowledge is one (sometimes the only) source of information we have available Ontologies are essential for comparing annotations across species and databases Developed a procedure to measure transferability of Gene Ontology (GO) and Plant Ontology (PO) terms QTL analysis and candidate gene prediction using integrative and comparative approaches
13. Acknowledgements Rothamsted Chris Rawlings Jan Taubert Catherine Canevet KeywanHassani-Pak Andrea Splendiani Matthew Hindle Artem Lysenko Michael Defoin-Platel Angela Karp Steve Hanley Kim Hammond-Kosack Martin Urban Manchester Robert Stevens Carole Goble Pedro Mendes Paul Dobson Paul Fisher David Withers Georgina Moulton Katy Wolstenholme NaCTem Sophia Ananiadou Gina-Anne Levow RaheelNawaz Newcastle Neil Wipat Phil Lord Darren Wilkinson Jochen Weile Matthew Pocock Simon Cockell James Dewar Katherine James Eva Holstein Edinburgh Igor Goryanin Andrew Millar Luna de Ferrari
14. From QTL to candidate genes Rae et al., 2008 Gene Prioritisation Can help to reduce hypothesis space from 100 potential candidates to few hot candidates. Next step experimental validation: Cloning and transformation in models (Arabidopsis/Poplar).
Notas do Editor
Years from: http://synteny.cnr.berkeley.edu/wiki/index.php/Sequenced_plant_genomesOnly 2% of all sequenced genomes are plant genomes. Over 20 plant genomes. Evolutionary distance between them rather quite large. Comparative approaches focus on transfering knowledge across species.We only have few genomes (taxonomy sampling) and their evolutionary distance is important (big)Challenging to transfer knowledge cross-speciesComparative bioinformatics approaches for predictingthe function of genes and their role in complex traits
Ontologies gives us a mean to structure the functional knowledge andBridge the gab across species and databases.In this recent work we looked at functional gene annotations between orthologous genes
Example of BC1 and COBL4 genes in Arabidopsis and rice. Both genes related to cellulose synthesis.Find orthologous genes in Arabidopsis and rice genomesGather all functional knowledge in form of GO, PO and TO annotationsFilter IEA annotations (not manually currated)
Experimental Evidence Codes EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern Change order of bullet points PO, GO, TOPut figure in separate slide
Can we transfer annotations from model plant to cropsChange side of ontology terms according to their conservation/transferability
QTL are genomic regions that assign variations observed in a phenotype to a region on the genetic mapBiomass traits: branching, height, leaf number etc.What is going on underneath of a QTL? We are going from Willow to Poplar to Arabidopsis and other species
Systematically search for all trait-based related genes in the genome.