SlideShare uma empresa Scribd logo
1 de 35
Mining biological knowledge networks for
gene-phenotype discovery
Keywan Hassani-Pak
http://knetminer.rothamsted.ac.uk/
Plant and Animal Genomes Conference 2017
@KnetMiner
The Genotype to Phenotype Challenge
Genotype
SNPs and Indels
Omics
Includes any ‘omics
Phenotype
Flowering
Defence
Development
Stress tolerance
Biological Knowledge Network
1. Methods to assemble and visualise an integrated
knowledge network of the cell
2. Methods to use the knowledge network to
translate genotype to phenotype
• Free and open source
• Data warehousing using a graph-
database
• Platform to integrate public and private
datasets in various formats
• Provides a GUI, CLI and APIs for
reproducible data integration workflows
Ondex – Data Integration Platform
Ondex
www.ondex.org
The approach is generic and works similarly for other species
Let’s get a GWAS dataset…
http://plants.ensembl.org/biomart
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
… transform into a network
(SNP)
(Phenotype)
associated
Biological interaction datasets
http://thebiogrid.org
(SNP)
(Phenotype)
associated
… add biological interactions
• Gene-GO
• Gene-Phenotype
Gene knock-out or overexpression
Text mining publications
• Gene-Publication
• Gene-Pathway
• Homology to yeast
• Homology to crops
Wheat
… finally add other open linked data
>500,000 nodes
>1,500,000 links
Genome-scale knowledge network
Relationships in Crop Knowledge Networks
GO
TO
encodes
text-mining
GWAS
P-Value 10-8
41% identity
EnsemblCompara
Genes Homology Annotations
encodes
Inferred from
Mutant Phenotype
PMID: 15598800
Genetics
QTL
GWAS
Marker
Interactions Phenotype
Mutations in TTG2
cause phenotypic
defects seed color
pigmentation.
PMID: 17766401
• Methods needed to evaluate millions of
relationships in knowledge network, prioritize
genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision
making
• Interpretation should be the task of domain
experts i.e. biologists!
How to search and interpret too much information?
KnetMiner – Systematic and evidence-based gene discovery
http://knetminer.rothamsted.ac.uk
Web Browser
KnetMiner
Client
KnetMiner
Server
Servlets and JSP Page
Java Socket
Knowledge
Graph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded
Java Server
HTML, JSON, XML and images
over HTTP via Ajax
Views
Java Socket
Java Applet
Flash
KnetMiner Software Architecture
Major improvements
to the user-interface.
Re-implemented Java
Applet and Flash
components in
JavaScript.
Now compatible with
most OS and touch
devices.
Which associations (genes) are worth following up?
Often a highly subjective decision
How is genotype translated to phenotype?
Often involves multi-omics interactions
KnetMiner search interface
KnetMiner Outputs
Use Case 1 – Mining GWAS and QTL data
• 96 or 192 Arabidopsis inbred lines
• Genotyped: 250,000 SNPs
• 107 phenotypes were measured
https://arapheno.1001genomes.org/study/1/
o Flowering
o Defence
o Ionomics
o Developmental
• Wilcoxon and EMMA (control population structure) statistical tests
GWAS of 107 Phenotypes in Arabidopsis
Atwell et al., Nature 2010
Examples where GWAS results are simple to interpret
Sodium concentration (Na)
Lesioning (LES)
AvrRpm1
Single, sharp peak of
association centred on
causal polymorphism
LD decays within 10 kb on average
in Arabidopsis
Examples where GWAS results are complex to interpret
FLC gene expression (FLC)
Leaf Number (LN22)
Days to flowering (FT Field)
Peaks are diffuse
covering several hundred
kb without a clear centre
Causal polymorphisms have not
always strongest association
Using KnetMiner to interpret GWAS results
Wilcoxon
results
EMMA
results
Atwell et al., Nature 2010
Flowering Locus C (FLC) gene expression
Demo: Exploring genes and networks controlling FLC expression
• Petal size QTL in Arabidopsis (in collaboration with John Doonan)
Using KnetMiner to prioritise genes in QTL
Use Case 2 – Mining differentially expressed
genes
#25
White grained wheat is more prone to pre-harvest sprouting (PHS)
• PHS is the result of premature germination of grain in
the ear and results in loss of bread-making quality
• Red grain colour is associated with increased dormancy
and resistance to PHS
• Grain colour is due to proanthocyanidins (condensed
tannins) in the testa
Sprouting
Grain colour
+ = white
o = red
Groos et al. (2002)TAG 104, 39-47
Red grain 20dpa
Andy Phillips
67 down-regulated genes
37 up-regulated genes
Over hundred statistically significant
genes.
How are these linked to grain colour
and PHS?
Differential Gene Expression Analysis
Google-like search interface
• Search knowledge graph using trait-
based keywords
• Real-time user feedback and query
suggestions
Trait related
keywords
Query term
suggestions
Genes linked to grain colour and/or PHS
Genes with direct or indirect links to grain colour and PHS
#29
KnetMiner methodology
Ondex Text-Mining Plugin
Input data
• 27,416 Arabidopsis gene names from Phytozome
• 52,561 Abstracts from PubMed that contain Arabidopsis
• 22,201 curated citations from TAIR
• 1,349 Trait Ontology terms from Planteome
Hassani-Pak et al., 2010
text-mining
x
y
BA
occurrs_in
Publication
Concepts
published_in
weighted association network
IP=1.7; M=1.2; N=2
yx
BAGeneTO
TO
Text-mining output
These steps connect 5553 Arabidopsis genes to 409 TO terms
based on 18,341 co-citations
• Uses TF*IDF to rank documents by their relevance to a search term
• Additionally, considers the properties of gene-evidence networks such as
 the specificity of documents to a gene
 the frequency of evidence concepts
• Smart pre-indexing of the knowledge network makes the computation of
the score very fast
Gene Ranking
• Web application for very fast search of
large genome-scale knowledge graphs
• Ranking of candidate genes based on
knowledge mining
• Interactive visualisation of genome
and knowledge maps
• Facilitates hypothesis validation and
generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
Acknowledgements
John Doonan
Sergio Feingold
Martin Castellote
Uwe Scholz
Matthias Lange
Andy Law
Keywan Hassani-Pak
Ajit Singh
Marco Brandizi
Monika Mistry
Lisa Lill
Chris Rawlings
Dave Edwards
Philipp Bayer
Misha Kapushesky
Kevin Dialdestoro
@KnetMiner

Mais conteúdo relacionado

Mais procurados

BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final PresentationShruthi Choudary
 
Careers in bioinformatics
Careers in bioinformaticsCareers in bioinformatics
Careers in bioinformaticsentranzz123
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsphilmaweb
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureRobert Cormia
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
 
Bioinformatics, Its Usage and Advantages
Bioinformatics, Its Usage and AdvantagesBioinformatics, Its Usage and Advantages
Bioinformatics, Its Usage and Advantagesbioinformatt
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 

Mais procurados (20)

BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final Presentation
 
Careers in bioinformatics
Careers in bioinformaticsCareers in bioinformatics
Careers in bioinformatics
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
 
Bioinformatics, Its Usage and Advantages
Bioinformatics, Its Usage and AdvantagesBioinformatics, Its Usage and Advantages
Bioinformatics, Its Usage and Advantages
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Semelhante a KnetMiner - Knowledge Network Miner

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Klaas Vandepoele
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalDr Anjani Kumar
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
21 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-21721 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-217Alexander Decker
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomicsClifford Stone
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportAraport
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
PLAZA 3.0 - an access point for plant comparative genomics
PLAZA 3.0 - an access point for plant comparative genomicsPLAZA 3.0 - an access point for plant comparative genomics
PLAZA 3.0 - an access point for plant comparative genomicsKlaas Vandepoele
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 

Semelhante a KnetMiner - Knowledge Network Miner (20)

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Omics in crop improvement
Omics in crop improvementOmics in crop improvement
Omics in crop improvement
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
21 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-21721 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-217
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomics
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Pangenomics.pptx
Pangenomics.pptxPangenomics.pptx
Pangenomics.pptx
 
PLAZA 3.0 - an access point for plant comparative genomics
PLAZA 3.0 - an access point for plant comparative genomicsPLAZA 3.0 - an access point for plant comparative genomics
PLAZA 3.0 - an access point for plant comparative genomics
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 

Último

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Último (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

KnetMiner - Knowledge Network Miner

  • 1. Mining biological knowledge networks for gene-phenotype discovery Keywan Hassani-Pak http://knetminer.rothamsted.ac.uk/ Plant and Animal Genomes Conference 2017 @KnetMiner
  • 2. The Genotype to Phenotype Challenge Genotype SNPs and Indels Omics Includes any ‘omics Phenotype Flowering Defence Development Stress tolerance Biological Knowledge Network 1. Methods to assemble and visualise an integrated knowledge network of the cell 2. Methods to use the knowledge network to translate genotype to phenotype
  • 3. • Free and open source • Data warehousing using a graph- database • Platform to integrate public and private datasets in various formats • Provides a GUI, CLI and APIs for reproducible data integration workflows Ondex – Data Integration Platform Ondex www.ondex.org
  • 4. The approach is generic and works similarly for other species
  • 5. Let’s get a GWAS dataset… http://plants.ensembl.org/biomart #SNP=66,816 | #Gene=27,502 | #Phenotype=107
  • 6. … transform into a network (SNP) (Phenotype) associated
  • 9. • Gene-GO • Gene-Phenotype Gene knock-out or overexpression Text mining publications • Gene-Publication • Gene-Pathway • Homology to yeast • Homology to crops Wheat … finally add other open linked data >500,000 nodes >1,500,000 links Genome-scale knowledge network
  • 10. Relationships in Crop Knowledge Networks GO TO encodes text-mining GWAS P-Value 10-8 41% identity EnsemblCompara Genes Homology Annotations encodes Inferred from Mutant Phenotype PMID: 15598800 Genetics QTL GWAS Marker Interactions Phenotype Mutations in TTG2 cause phenotypic defects seed color pigmentation. PMID: 17766401
  • 11. • Methods needed to evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks • Interactive and exploratory tools needed to enable knowledge discovery and decision making • Interpretation should be the task of domain experts i.e. biologists! How to search and interpret too much information?
  • 12. KnetMiner – Systematic and evidence-based gene discovery http://knetminer.rothamsted.ac.uk
  • 13. Web Browser KnetMiner Client KnetMiner Server Servlets and JSP Page Java Socket Knowledge Graph DBOndex API DHTML JavaScript Apache Tomcat Multithreaded Java Server HTML, JSON, XML and images over HTTP via Ajax Views Java Socket Java Applet Flash KnetMiner Software Architecture Major improvements to the user-interface. Re-implemented Java Applet and Flash components in JavaScript. Now compatible with most OS and touch devices.
  • 14. Which associations (genes) are worth following up? Often a highly subjective decision How is genotype translated to phenotype? Often involves multi-omics interactions
  • 17. Use Case 1 – Mining GWAS and QTL data
  • 18. • 96 or 192 Arabidopsis inbred lines • Genotyped: 250,000 SNPs • 107 phenotypes were measured https://arapheno.1001genomes.org/study/1/ o Flowering o Defence o Ionomics o Developmental • Wilcoxon and EMMA (control population structure) statistical tests GWAS of 107 Phenotypes in Arabidopsis Atwell et al., Nature 2010
  • 19. Examples where GWAS results are simple to interpret Sodium concentration (Na) Lesioning (LES) AvrRpm1 Single, sharp peak of association centred on causal polymorphism LD decays within 10 kb on average in Arabidopsis
  • 20. Examples where GWAS results are complex to interpret FLC gene expression (FLC) Leaf Number (LN22) Days to flowering (FT Field) Peaks are diffuse covering several hundred kb without a clear centre Causal polymorphisms have not always strongest association
  • 21. Using KnetMiner to interpret GWAS results Wilcoxon results EMMA results Atwell et al., Nature 2010 Flowering Locus C (FLC) gene expression
  • 22. Demo: Exploring genes and networks controlling FLC expression
  • 23. • Petal size QTL in Arabidopsis (in collaboration with John Doonan) Using KnetMiner to prioritise genes in QTL
  • 24. Use Case 2 – Mining differentially expressed genes
  • 25. #25 White grained wheat is more prone to pre-harvest sprouting (PHS) • PHS is the result of premature germination of grain in the ear and results in loss of bread-making quality • Red grain colour is associated with increased dormancy and resistance to PHS • Grain colour is due to proanthocyanidins (condensed tannins) in the testa Sprouting Grain colour + = white o = red Groos et al. (2002)TAG 104, 39-47 Red grain 20dpa Andy Phillips
  • 26. 67 down-regulated genes 37 up-regulated genes Over hundred statistically significant genes. How are these linked to grain colour and PHS? Differential Gene Expression Analysis
  • 27. Google-like search interface • Search knowledge graph using trait- based keywords • Real-time user feedback and query suggestions Trait related keywords Query term suggestions
  • 28. Genes linked to grain colour and/or PHS
  • 29. Genes with direct or indirect links to grain colour and PHS #29
  • 31. Ondex Text-Mining Plugin Input data • 27,416 Arabidopsis gene names from Phytozome • 52,561 Abstracts from PubMed that contain Arabidopsis • 22,201 curated citations from TAIR • 1,349 Trait Ontology terms from Planteome Hassani-Pak et al., 2010 text-mining x y BA occurrs_in Publication Concepts published_in weighted association network IP=1.7; M=1.2; N=2 yx BAGeneTO TO
  • 32. Text-mining output These steps connect 5553 Arabidopsis genes to 409 TO terms based on 18,341 co-citations
  • 33. • Uses TF*IDF to rank documents by their relevance to a search term • Additionally, considers the properties of gene-evidence networks such as  the specificity of documents to a gene  the frequency of evidence concepts • Smart pre-indexing of the knowledge network makes the computation of the score very fast Gene Ranking
  • 34. • Web application for very fast search of large genome-scale knowledge graphs • Ranking of candidate genes based on knowledge mining • Interactive visualisation of genome and knowledge maps • Facilitates hypothesis validation and generation KnetMiner – Making Gene Discovery Efficient & Fun http://knetminer.rothamsted.ac.uk/
  • 35. Acknowledgements John Doonan Sergio Feingold Martin Castellote Uwe Scholz Matthias Lange Andy Law Keywan Hassani-Pak Ajit Singh Marco Brandizi Monika Mistry Lisa Lill Chris Rawlings Dave Edwards Philipp Bayer Misha Kapushesky Kevin Dialdestoro @KnetMiner

Notas do Editor

  1. This is a reminder that you are scheduled to present in the PAG workshop Saturday, January 14, 2017. The schedule of presenters is as follows.   10:30 AM QTLNetMiner, interrogate plant and animal knowledge networks Keywan Hassani-Pak 10:50 AM BrAPI, a standard interface for plant databases Jan Erik Backlund 11:10 AM Visualizations of Phenotypic and QTL Data David Marshall 11:30 AM Cyverse Data Commons Ramona Walls 11:50 AM Transplant Integrated Search Using Apache Solr Paul J. Kersey 12:10 PM Wheatis : A Genetics and Genomics Information System for the Wheat Research Community Hadi Quesneville   Connecting Crop Phenotype Data Saturday, January 14, 2017 Golden Ballroom   You can upload your presentation at the Speaker Ready Room in Terrace Salon 2 on Friday until 8pm or Saturday morning starting at 7am. If you have any questions please fe free send me an email.   Clay Birkett Cornell University, USDA Ithaca, NY
  2. Creating improved crop varieties needs the identification of important traits and the discovery of causal genes Linking genotype and phenotype is one of the greatest challenges in biology Many phenotypes are complex, polygenic and the result of complex interactions on cellular level We need methods to build knowledge networks through 1) integration of heterogeneous datasets and 2) to search these networks with QTL, SNP, gene expression, keyword in order to link genotype to phenotype.
  3. SNP-Phenotype relations (122,919 relations) of significant SNPs (as defined by Ensembl, p-value<0.05?) linked to 107 phenotypes; on average 1,150 SNPs per phenotype. SNP-Gene relations are based on genes in close proximity to SNPs <1000bp (96,047 relations) How to integrate GWAS and biological interaction data
  4. Using Ondex
  5. http://www.sciencedirect.com/science/article/pii/S2212066116300308
  6. Highlight text-mining
  7. Scale…
  8. Worth = Have a positive impact on the biological outcome in the whole organism without producing negative side effects. Significant SNPs are rarely located within the causal gene sequence… Consider LD, closest gene is not always the correct candidate… Consider cofounding, strongest association not always the main causal effect…
  9. Non-parametric Wilcoxon rank-sum test (F-test for phenotypes that are categorical and not quantitative)
  10. LD in Arabidopsis decays within 10 kb on average https://www.ncbi.nlm.nih.gov/pubmed/17676040
  11. Up to 192 Arabidopsis inbred lines were genotyped for 250k SNPs and phenotyped for 107 traits including flowering, defence, ionomics and development Phenotype data available in AraPheno database https://arapheno.1001genomes.org/study/1/
  12. Search term: flowering FLC Integrating our own experimental data with the wealth of published open data Put your experiment in the context of hundred similar experiments… Compare myQTL to other QTL/GWAS and functional genomics studies.
  13. Boosting queries found in the title TO by Laurel Cooper