Introducing the KnetMiner Knowledge Graph: things, not strings

Introducing the KnetMiner Knowledge Graph:
things, not strings.
Keywan Hassani-Pak
Head of Bioinformatics
1 April 2019 – Rothamsted Seminar

Biological Knowledge Discovery
Unravelling the biological story behind complex traits

Genes are rarely single actors
To explain the metaphor
• Unravelling the roles of genes in complex trait genomics is often similar
to unmasking the heroes and villains in a plant biology who-dunnit
• Multiple genes are generally involved and play different roles. How they
interact reveals the plotline(s) in the trait story
• We build software and data resources to help automate the process of
explaining the plotlines and unravelling the story behind complex trait
biology
• We are creating computational tools that bring different types of
evidence together and then to weigh-up the competing story lines to
present the user with the most compelling ones.

Assembling the evidence – data is just the beginning!
Genetic diversity
Sequencing data
Phenotype data
Literature
Pathways
Ontologies
Better yields
NUE
Disease Resistance
Reference genomes
Gene expression
QTL
Data Information Knowledge Understanding
We need all of these components to come together to understand our traits

KnetMiner – Accelerating biological discovery
http://knetminer.rothamsted.ac.uk/
@KnetMiner
• Gene Networks
• Bio Databases
• Data Integration
• Text Mining
• Visualisation
• AI & Graphs
• RDF & Neo4j
• Java & JavaScript

How does KnetMiner work?
Computer says: “Try this”

The Rise of Graph Analytics
From Leonhard Euler to Google’s Knowledge Graphs

Seven Bridges of Königsberg - a historically notable problem in mathematics
Its resolution by Leonhard Euler in 1736 laid the foundations of graph theory
Leonhard Euler - 1736

I N T E R A C TO M E N E T WO R K G R A P H

Pathfinding
Finds the shortest path or
evaluates route availability
and quality
Centrality
Determines the
importance of distinct
nodes in the network
Community Detection
Evaluates how a group is
clustered or partitioned
Graph Algorithms

North WykeHarpenden
distance to
Rothamsted
Research
Research Institute
Site Site
miles: 230
What is a Knowledge Graph?
year: 1843
1843
Year
founded in

Knowledge Graph: things, not strings

KnetMiner Knowledge Graph
A Knowledge Graph for gene mining and biological discovery

RNA
SNP Phenotype
Variety
Trial
Soil
Protein
Crop
Insect
Disease
Microbe
Species
Gene
Pesticide
Publication
Function
associated
interacts
Weather
Fertiliser
Field
Manag
ement
Time
Germ
plasm
Meta
bolite
Weeds

Where do KnetMiner relations come from?
Some examples:

Example - GWAS data
http://plants.ensembl.org/biomart/martview
Example Arabidopsis
#SNP=66,816 | #Gene=27,502 | #Phenotype=107

… transform into a graph
(SNP)
(Trait)
associated
Using Ondex-Knet-Builder
https://github.com/Rothamsted/ondex-knet-builder

Add Biological interaction datasets
http://thebiogrid.org

(SNP)
(Tait)
associated
… add biological interactions

Towards FAIR KnetMiner Knowledge Graphs
• Green: Ondex plug-ins
• rdf2neo is a generic, non Ondex-specific rdf->Neo4j conversion tool
• Brandizi et al., IB-2018
(https://dx.doi.org/10.1515%2Fjib-2018-0023)
• Brandizi et al., SWAT4LS-2018
(https://doi.org/10.6084/m9.figshare.7314323.v1)

Programmatic Access via Graph Query Languages
MATCH
// branching via ‘|’
(prot:Protein) - [:produced_by|consumed_by] -> (:Reaction)
// variable-length chains
- [:part_of*1..3] -> (pway:Path)
RETURN
prot.name, pwy LIMIT 1000
// Very compact forms available:
MATCH (prot:Protein) - (pway:Path) RETURN pway
• RDF + OWL used as a standardised modelling/representation language (see BioKNO
ontology: github.com/Rothamsted/bioknet-onto)
• SPARQL available too, both having pros/cons (see our benchmark:
github.com/Rothamsted/graphdb-benchmarks)
• Cypher being used for KnetMiner queries (work in progress)

Open Source Code
client web-server
knet-builder
Deployment Model
Data Integration
Workflows
Database Service Model
Private data
Public data
Databases and graph
queries required by
KnetMiner
For a species or
domain of interest
Graph Queries
Using Knet-Builder tools

How to search and interpret so much information?
• Methods needed to evaluate millions of
relationships in knowledge graph, prioritize
genes and extract relevant subgraphs
• Interactive and exploratory tools needed to
enable knowledge discovery
• Interpretation should be the task of domain
experts i.e. biologists!

KnetMiner Use Cases
Visualising data connections promises faster discoveries

What does KnetMiner know
about your gene?
Reverse genetics applications: gene to phenotype

Wheat
Arabidopsis
What happens if we
knockout TT2 (R Myb) in
wheat?
With Andy Phillips

What KnetMiner knows about TT2 (R Myb)
TT2 (R Myb) on chromosome 3D in wheat is predicted (p-value=0.01) to regulate the
transcriptional activation of MFT according to data from the analysis of 850 RNA-seq
samples in wheat (Ramírez-González et al. 2018) using GENIE3 (Huynh-Thu et al. 2010).
The TT2 3B homoeolog is not predicted to regulate MFT, and the TT2 3A homoeolog is not
annotated in the latest version of the wheat genome.
MFT has been recently linked to grain germination [(Zong Y ; Li Q ): “Recent studies in
both Arabidopsis and wheat have uncovered a new role of MOTHER OF FT AND TFL1
(MFT) in seed germination”] and seed dormancy [(Nakamura S ): “Mapping analysis
showed that MFT on chromosome 3A (MFT-3A) colocalized with the seed dormancy
quantitative trait locus (QTL) QPhs.ocs-3A.”].
The MFT ortholog in Arabidopsis has a 3’ UTR variant that has been associated with (p-
value = 5.5x10-5) increased germination rate after 56 days of dry storage (Atwell et al.
2010).

DFW Nov 2018
Visualising connections can lead to new lines of inquiries
• Can grain colour and PHS be linked because R Myb targets the grain germination gene MFT?
• Do white grain varieties (R Myb mutants) have increased root hair density?
• Is there a link between root hair density and PHS?

What does KnetMiner know
about your trait?
Forward genetics applications: phenotype to gene

https://arapheno.1001genomes.org

https://aragwas.1001genomes.org
P-value: 7.7E-7
Distance: 285bp
Identity: 81%
PPI: Two-Hybrid
ortholog
Arabidopsis
Wheat
http://knetminer.rothamsted.ac.uk/Triticum_aestivum/

IRRI Germplasm Acquisition of early seedling
traits and image processing
GWAS
Gene discovery
related to seed
vigour
Guillaume Menard
Smita Kurup
Peter Eastmond
Kirsty Hassall
David Hughes
Colin Li
Direct Seeded Rice
Improve seedling
establishment,
emergence and
vigour
http://knetminer.rothamsted.ac.uk/Oryza_sativa/

Future Work
Growing and Learning the Knowledge Graph

Growing the KnetMiner
Knowledge Graph

Future development
• Personalised search experience
• Save and share your networks
• Like and Dislike buttons on relations
• Personalised networks based on usage data
• Better knowledge visualisation
• Reduce information overload using network clustering
• Annotate nodes with quantitative user data
• Predictive network analysis
• Find research trends using publication and graph data
• Automatic story and hypotheses generation

Future vision – Automatic story generation
TT2 (R Myb) on chromosome 3D in wheat is predicted (p-value=0.01) to regulate the transcriptional activation of MFT
according to data from the analysis of 850 RNA-seq samples in wheat (Ramírez-González et al. 2018) using GENIE3
(Huynh-Thu et al. 2010). The TT2 3B homeologue is not predicted to regulate MFT, and the TT2 3A homeologue is not
annotated in the latest version of the wheat genome. MFT has been recently linked to grain germination [(Zong Y ; Li Q ):
“Recent studies in both Arabidopsis and wheat have uncovered a new role of MOTHER OF FT AND TFL1 (MFT) in seed
germination”] and seed dormancy [(Nakamura S ): “Mapping analysis showed that MFT on chromosome 3A (MFT-3A)
colocalized with the seed dormancy quantitative trait locus (QTL) QPhs.ocs-3A.”]. The MFT ortholog in Arabidopsis has a
3’ UTR variant that has been associated with (p-value = 5.5x10-5) increased germination rate after 56 days of dry
storage (Atwell et al. 2010).
• Well structured semantics of entities and
relationships in the network
• Maintaining provenance of derived
relations
• Confidence values on relations
• Supporting literature

KnetMiner Impact
• Over 1800 unique users last year
• 68% of users from non-UK countries
• KnetMiner part of 3 grants recently submitted to BBSRC (BBR, sLoLa)
• KnetMiner code is open-source; v3.0 released in Feb 2019
• Developers are starting to contribute to our open-source tools
• Resources are starting to link into KnetMiner (eg. WheatIS, T3, Ensembl)
• Requests to build KnetMiner for new species (eg. Sugarcane, honey bee)
• Invited to run training courses and workshops (eg. EBI, PAG, IB2018)
• KnetMiner used in two agrifood companies, several companies have
expressed interest

The right data in the right
format and in the right hands at
the right time, saves lives.
#us2ts2019 @hdeus

Acknowledgements (strings - for humans only)
Bioinformatics Lab
Ajit Singh
Marco Brandizi
Sandeep Amberkar
Emma Bailey
Dan Smith
David Hughes
Rob King
Colin Li
Chris Rawlings
William Brown (ITS)
Follow us on Twitter: @KnetMiner
Collaborators & Contributors
Richard Holland (NFVL)
Misha Kapushesky (Genestack)
Kevin Dialdestoro (Genestack)
Vasiliki Koutra (KCL)
Martin Castellote (INTA)
Philipp Bayer (UWA)
Jean-Luc Jannink (Cornell)
Clay Birkett (Cornell)
Cyril Pommier (INRA)
Ramil Mauleon (IRRI)
Jan Taubert (KWS)
Uwe Scholz (IPK)
Matthias Lange (IPK)
Kumar Saurabh Singh (Exeter)
Monika Mistry
DFW WP4 members
Data Contributors
Clay Birkett (Cornell)
Cristobal Uauy (JIC)
Philippa Borrill (JIC)
Andrea Bräutigam (Uni Bielefeld)
Philipp Bayer (UWA)
Ramil Mauleon (IRRI)
Funding
Designing Future Wheat (BBSRC)
Pest Genomics Initiative
Users
Andy Phillips (RRes)
Rowan Mitchell (RRes)
Steve Hanley (RRes)
Richard Barker (NASA)

Acknowledgements (things - for machines &humans)
www.knetminer.org
Twitter: @KnetMiner

Introducing the KnetMiner Knowledge Graph: things, not strings

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Introducing the KnetMiner Knowledge Graph: things, not strings

Semelhante a Introducing the KnetMiner Knowledge Graph: things, not strings (20)

Último

Último (20)

Introducing the KnetMiner Knowledge Graph: things, not strings

Notas do Editor