KnetMiner provides an easy to use web interface to visualisation and data mining tools for the discovery and evaluation of candidate genes from large scale integrations of public and private data sets. It addresses the needs of scientists who generally lack the time and technical expertise to review all relevant information available in the literature, from key model species and from a potentially wide range of related biological databases. We have previously developed genome-scale knowledge networks (GSKNs) for multiple crop and animal species (Hassani-Pak et al. 2016). The KnetMiner web server searches and evaluates millions of relations and concepts within the GSKNs in real-time to determine if direct or indirect links between genes and trait-based keywords can be established. KnetMiner accepts as user inputs: search terms in combination with a gene list and/or genomic regions. It produces a table of ranked candidate genes and allows users to explore the output in interactive genome and network map visualisation tools that have been optimised for web use on desktop and mobile devices. The KnetMiner web server and the GSKNs provide a step-forward towards systematic and evidence-based gene discovery.
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
KnetMiner - Knowledge Network Miner
1. Mining biological knowledge networks for
gene-phenotype discovery
Keywan Hassani-Pak
http://knetminer.rothamsted.ac.uk/
Plant and Animal Genomes Conference 2017
@KnetMiner
2. The Genotype to Phenotype Challenge
Genotype
SNPs and Indels
Omics
Includes any ‘omics
Phenotype
Flowering
Defence
Development
Stress tolerance
Biological Knowledge Network
1. Methods to assemble and visualise an integrated
knowledge network of the cell
2. Methods to use the knowledge network to
translate genotype to phenotype
3. • Free and open source
• Data warehousing using a graph-
database
• Platform to integrate public and private
datasets in various formats
• Provides a GUI, CLI and APIs for
reproducible data integration workflows
Ondex – Data Integration Platform
Ondex
www.ondex.org
4. The approach is generic and works similarly for other species
5. Let’s get a GWAS dataset…
http://plants.ensembl.org/biomart
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
9. • Gene-GO
• Gene-Phenotype
Gene knock-out or overexpression
Text mining publications
• Gene-Publication
• Gene-Pathway
• Homology to yeast
• Homology to crops
Wheat
… finally add other open linked data
>500,000 nodes
>1,500,000 links
Genome-scale knowledge network
10. Relationships in Crop Knowledge Networks
GO
TO
encodes
text-mining
GWAS
P-Value 10-8
41% identity
EnsemblCompara
Genes Homology Annotations
encodes
Inferred from
Mutant Phenotype
PMID: 15598800
Genetics
QTL
GWAS
Marker
Interactions Phenotype
Mutations in TTG2
cause phenotypic
defects seed color
pigmentation.
PMID: 17766401
11. • Methods needed to evaluate millions of
relationships in knowledge network, prioritize
genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision
making
• Interpretation should be the task of domain
experts i.e. biologists!
How to search and interpret too much information?
12. KnetMiner – Systematic and evidence-based gene discovery
http://knetminer.rothamsted.ac.uk
13. Web Browser
KnetMiner
Client
KnetMiner
Server
Servlets and JSP Page
Java Socket
Knowledge
Graph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded
Java Server
HTML, JSON, XML and images
over HTTP via Ajax
Views
Java Socket
Java Applet
Flash
KnetMiner Software Architecture
Major improvements
to the user-interface.
Re-implemented Java
Applet and Flash
components in
JavaScript.
Now compatible with
most OS and touch
devices.
14. Which associations (genes) are worth following up?
Often a highly subjective decision
How is genotype translated to phenotype?
Often involves multi-omics interactions
18. • 96 or 192 Arabidopsis inbred lines
• Genotyped: 250,000 SNPs
• 107 phenotypes were measured
https://arapheno.1001genomes.org/study/1/
o Flowering
o Defence
o Ionomics
o Developmental
• Wilcoxon and EMMA (control population structure) statistical tests
GWAS of 107 Phenotypes in Arabidopsis
Atwell et al., Nature 2010
19. Examples where GWAS results are simple to interpret
Sodium concentration (Na)
Lesioning (LES)
AvrRpm1
Single, sharp peak of
association centred on
causal polymorphism
LD decays within 10 kb on average
in Arabidopsis
20. Examples where GWAS results are complex to interpret
FLC gene expression (FLC)
Leaf Number (LN22)
Days to flowering (FT Field)
Peaks are diffuse
covering several hundred
kb without a clear centre
Causal polymorphisms have not
always strongest association
21. Using KnetMiner to interpret GWAS results
Wilcoxon
results
EMMA
results
Atwell et al., Nature 2010
Flowering Locus C (FLC) gene expression
23. • Petal size QTL in Arabidopsis (in collaboration with John Doonan)
Using KnetMiner to prioritise genes in QTL
24. Use Case 2 – Mining differentially expressed
genes
25. #25
White grained wheat is more prone to pre-harvest sprouting (PHS)
• PHS is the result of premature germination of grain in
the ear and results in loss of bread-making quality
• Red grain colour is associated with increased dormancy
and resistance to PHS
• Grain colour is due to proanthocyanidins (condensed
tannins) in the testa
Sprouting
Grain colour
+ = white
o = red
Groos et al. (2002)TAG 104, 39-47
Red grain 20dpa
Andy Phillips
26. 67 down-regulated genes
37 up-regulated genes
Over hundred statistically significant
genes.
How are these linked to grain colour
and PHS?
Differential Gene Expression Analysis
27. Google-like search interface
• Search knowledge graph using trait-
based keywords
• Real-time user feedback and query
suggestions
Trait related
keywords
Query term
suggestions
31. Ondex Text-Mining Plugin
Input data
• 27,416 Arabidopsis gene names from Phytozome
• 52,561 Abstracts from PubMed that contain Arabidopsis
• 22,201 curated citations from TAIR
• 1,349 Trait Ontology terms from Planteome
Hassani-Pak et al., 2010
text-mining
x
y
BA
occurrs_in
Publication
Concepts
published_in
weighted association network
IP=1.7; M=1.2; N=2
yx
BAGeneTO
TO
33. • Uses TF*IDF to rank documents by their relevance to a search term
• Additionally, considers the properties of gene-evidence networks such as
the specificity of documents to a gene
the frequency of evidence concepts
• Smart pre-indexing of the knowledge network makes the computation of
the score very fast
Gene Ranking
34. • Web application for very fast search of
large genome-scale knowledge graphs
• Ranking of candidate genes based on
knowledge mining
• Interactive visualisation of genome
and knowledge maps
• Facilitates hypothesis validation and
generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
35. Acknowledgements
John Doonan
Sergio Feingold
Martin Castellote
Uwe Scholz
Matthias Lange
Andy Law
Keywan Hassani-Pak
Ajit Singh
Marco Brandizi
Monika Mistry
Lisa Lill
Chris Rawlings
Dave Edwards
Philipp Bayer
Misha Kapushesky
Kevin Dialdestoro
@KnetMiner
Notas do Editor
This is a reminder that you are scheduled to present in the PAG workshop Saturday, January 14, 2017. The schedule of presenters is as follows.
10:30 AM QTLNetMiner, interrogate plant and animal knowledge networks Keywan Hassani-Pak 10:50 AM BrAPI, a standard interface for plant databases Jan Erik Backlund 11:10 AM Visualizations of Phenotypic and QTL Data David Marshall 11:30 AM Cyverse Data Commons Ramona Walls 11:50 AM Transplant Integrated Search Using Apache Solr Paul J. Kersey 12:10 PM Wheatis : A Genetics and Genomics Information System for the Wheat Research Community Hadi Quesneville
Connecting Crop Phenotype Data
Saturday, January 14, 2017
Golden Ballroom
You can upload your presentation at the Speaker Ready Room in Terrace Salon 2 on Friday until 8pm or Saturday morning starting at 7am. If you have any questions please fe free send me an email.
Clay Birkett
Cornell University, USDA
Ithaca, NY
Creating improved crop varieties needs the identification of important traits and the discovery of causal genes
Linking genotype and phenotype is one of the greatest challenges in biology
Many phenotypes are complex, polygenic and the result of complex interactions on cellular level
We need methods to build knowledge networks through 1) integration of heterogeneous datasets and 2) to search these networks with QTL, SNP, gene expression, keyword in order to link genotype to phenotype.
SNP-Phenotype relations (122,919 relations) of significant SNPs (as defined by Ensembl, p-value<0.05?) linked to 107 phenotypes; on average 1,150 SNPs per phenotype.
SNP-Gene relations are based on genes in close proximity to SNPs <1000bp (96,047 relations)
How to integrate GWAS and biological interaction data
Worth = Have a positive impact on the biological outcome in the whole organism without producing negative side effects.
Significant SNPs are rarely located within the causal gene sequence…
Consider LD, closest gene is not always the correct candidate…
Consider cofounding, strongest association not always the main causal effect…
Non-parametric Wilcoxon rank-sum test (F-test for phenotypes that are categorical and not quantitative)
LD in Arabidopsis decays within 10 kb on average
https://www.ncbi.nlm.nih.gov/pubmed/17676040
Up to 192 Arabidopsis inbred lines were genotyped for 250k SNPs and phenotyped for 107 traits including flowering, defence, ionomics and development
Phenotype data available in AraPheno database
https://arapheno.1001genomes.org/study/1/
Search term: flowering FLC
Integrating our own experimental data with the wealth of published open data
Put your experiment in the context of hundred similar experiments…
Compare myQTL to other QTL/GWAS and functional genomics studies.
Boosting queries found in the title
TO by Laurel Cooper