SlideShare uma empresa Scribd logo
1 de 36
Working with Trees in the Phyloinformatic Age William H. Piel Yale Peabody Museum Hilmar Lapp NESCent, Duke University
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object]
Dewey system: A B C D E 0.1 0.1.1 0.1.2 0.2 0.2.1 0.2.1.1 0.2.1.2 0.2.2 0
Find clade for: Z = (<C S +D s ) Find common pattern starting from left SELECT *  FROM nodes WHERE (path LIKE “0.2.1%”); 0.2.2 E 0.2.1.2 D 0.2.1.1 C 0.2.1 NULL 0.2 NULL 0.1.2 B 0.1.1 A 0.1 NULL 0 Root Path Label A B C D E
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
Depth-first traversal scoring each node with a lef and right ID A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
SELECT *  FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Minimum Spanning Clade of Node 5 16 15 E 13 12 D 11 10 C 14 9 17 8 6 5 B 4 3 A 7 2 18 1 Right Left Label A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
[object Object],[object Object],[object Object],[object Object]
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
A B C D E 1 2 3 4 5 6 7 8 9 - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6
SQL Query to find parent node of node “D”: SELECT * FROM nodes AS parent INNER JOIN nodes AS child ON (child.parent_id = parent.node_id) WHERE child.node_label = ‘D’; … but this requires an external procedure to navigate the tree. - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6 node_label: node_id: parent_id:
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object],[object Object]
Searching trees by distance metrics:  USim distance Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel. 2005. Fast Structural Search in Phylogenetic Databases.  Evolutionary Bioinformatics Online , 1: 37-46 A B C D A B C D 0 1 1 1 D 2 0 1 1 C 3 2 0 1 B 3 2 1 0 A D C B A 0 1 2 2 D 1 0 2 2 C 2 2 0 1 B 2 2 1 0 A D C B A
Searching Stored Tree ,[object Object],[object Object],[object Object],[object Object]
Transitive Closure ,[object Object],[object Object],[object Object],[object Object],[object Object]
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BioSQL:  http://www.biosql.org/ Schema for persistent storage of sequences and features tightly integrated with BioPerl (+ BioPython, BioJava, and BioRuby) •  phylodb extension designed at NESCent Hackathon  •  perl command-line interface by Jamie Estill, GSoC
CREATE TABLE node_path ( child_node_id integer, parent_node_id integer, distance integer); Index of all paths from ancestors to descendants A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B'; Find all paths where A and B share a common parent_node_id A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B' ORDER BY pA.distance LIMIT 1; … of those paths, select one that has the shortest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B' ORDER BY pA.distance DESC LIMIT 1; … of those paths, select one that has the longest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
SELECT e.parent_id AS parent, e.child_id AS child, ch.node_label, pt.tree_id FROM node_path p, edges e, nodes pt, nodes ch WHERE e.child_id = p.child_node_id AND pt.node_id = e.parent_id AND ch.node_id = e.child_id AND p.parent_node_id IN (        SELECT pA.parent_node_id        FROM   node_path pA, node_path pB, nodes nA, nodes nB        WHERE pA.parent_node_id = pB.parent_node_id        AND   pA.child_node_id = nA.node_id        AND   nA.node_label = 'A'        AND   pB.child_node_id = nB.node_id        AND   nB.node_label = 'B') AND NOT EXISTS (      SELECT 1 FROM node_path np, nodes n      WHERE    np.child_node_id = n.node_id      AND n.node_label  = 'C'      AND np.parent_node_id = p.parent_node_id); Find the maximum spanning clade (i.e. the subtree) for each tree that  includes A and B but not C: Get all  ancestors  shared by  A and B Exclude those that are also ancestors to C Return an adjacency list for each subtree
SELECT DISTINCT t.tree_id, t.name FROM node_path p, nodes ch, trees t WHERE ch.node_id = p.child_node_id AND ch.tree_id = t.tree_id AND p.parent_node_id IN ( SELECT pA.parent_node_id FROM  node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND  pA.child_node_id = nA.node_id AND  nA.node_label = 'A' AND  pB.child_node_id = nB.node_id AND  nB.node_label = 'B') AND NOT EXISTS ( SELECT 1 FROM node_path np, nodes n WHERE np.child_node_id = n.node_id AND n.node_label  = 'C' AND np.parent_node_id = p.parent_node_id); Find trees that contain a clade that includes A and B but not C: Get all  ancestors  shared by  A and B Exclude those that are also ancestors to C List the set of trees with these ancestors
SELECT qry.tree_id, MIN(qry.name) AS &quot;tree_name&quot; FROM ( SELECT DISTINCT ON (n.node_id) n.node_id, t.tree_id, t.name FROM trees t, nodes n,  (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca, WHERE n.node_id IN (lca2.parent_node_id) AND t.tree_id = n.tree_id AND NOT EXISTS (SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS (SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2)) AS qry GROUP BY (qry.tree_id) HAVING COUNT(qry.node_id) = 1; Find trees that contain a clade that includes (A, B, C) but not D or E: Get all ancestors of A, B, C from all trees that have  A, B, C Exclude those that are also ancestors to D, E But make sure that the tree still contains D, E Number of clades that each tree must satisfy Number of ingroups that share node Number of non-ingroups that must be in tree
SELECT t.tree_id, t.name FROM trees t INNER JOIN (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id, inN.tree_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca USING (tree_id)  WHERE NOT EXISTS ( SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS ( SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2); Here's a faster, cleaner version:
Matching a whole tree means querying for all clades (A, B) but not C, D, E (C, D) but not A, B, E (C, D, E) but not A, B A B C D E 1 2 3 4 5 6 7 8 9
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
(((Sus_scrofa, Hippopotamus),Balaenoptera),Equus_caballus) vs ((Sus_scrofa, (Hippopotamus,Balaenoptera)),Equus_caballus) Mining trees for interesting, general, relationship questions: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Balaenoptera Hippopotamus Sus scrofa Equus caballus Felis catus
Even if with perfectly-resolved OTUs, you will still fail to hit relevant trees: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Sus celebensis Hippopotamus Balaenoptera Equus asinus Felis catus
Step 1: for each clade all trees in database, run a stem query on a classification tree (e.g. NCBI) Stem Queries: Node 2: (>A, B - C, D, E) Node 3: (>A - B, C, D, E) Node 4: (>B - A, C, D, E) Node 5: (>C, D, E - A, B) Node 6: (>C, D - A, B, E) Node 7: (>C - A, B, D, E) Node 8: (>D - A, B, C, E) Node 9: (>E - A, B, C, D) Step 2: label each node with an NCBI taxon id (if there is a match) Step 3: do the same for the query tree A B C D E 1 2 3 4 5 6 7 8 9
Rename nodes according to their deepest stem query… Gorilla gorilla Homo sapiens Pan troglodytes Macaca sinica Macaca nigra Hominoidea Cercopithecoidea Gorilla Homo Pan Macaca sinica Macaca nigra Pongo pygmaeus Macaca irus Hominoidea Cercopithecoidea
Dealing with the Growth of Phyloinformatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PhyloWidget ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Traversals | Data Structures
Traversals | Data StructuresTraversals | Data Structures
Traversals | Data Structures
 
Binary Search Tree and AVL
Binary Search Tree and AVLBinary Search Tree and AVL
Binary Search Tree and AVL
 
1.5 binary search tree
1.5 binary search tree1.5 binary search tree
1.5 binary search tree
 
Trees in Data Structure
Trees in Data StructureTrees in Data Structure
Trees in Data Structure
 
Tree and binary tree
Tree and binary treeTree and binary tree
Tree and binary tree
 
Tree in data structure
Tree in data structureTree in data structure
Tree in data structure
 
Database adapter
Database adapterDatabase adapter
Database adapter
 
Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures Trees, Binary Search Tree, AVL Tree in Data Structures
Trees, Binary Search Tree, AVL Tree in Data Structures
 
Database adapter
Database adapterDatabase adapter
Database adapter
 
07 trees
07 trees07 trees
07 trees
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Data Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search TreeData Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search Tree
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
 
Week 8 (trees)
Week 8 (trees)Week 8 (trees)
Week 8 (trees)
 
Lecture9 recursion
Lecture9 recursionLecture9 recursion
Lecture9 recursion
 
Trees
TreesTrees
Trees
 
Binary tree
Binary treeBinary tree
Binary tree
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
 

Destaque

Something about links
Something about linksSomething about links
Something about linksRoderic Page
 
Data Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionData Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionRoderic Page
 
Phyloinformatics: Introduction
Phyloinformatics: IntroductionPhyloinformatics: Introduction
Phyloinformatics: IntroductionRoderic Page
 
RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009Eugene Kalinin
 
Ответственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеОтветственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеEugene Kalinin
 
Making data sticky
Making data stickyMaking data sticky
Making data stickyRoderic Page
 
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Roderic Page
 
Трекшн карта
Трекшн картаТрекшн карта
Трекшн картаEugene Kalinin
 
Менторская программа Startup Magic
Менторская программа Startup MagicМенторская программа Startup Magic
Менторская программа Startup MagicEugene Kalinin
 
Трекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюТрекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюEugene Kalinin
 
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеБизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеEugene Kalinin
 
Новый социальный процесс
Новый социальный процессНовый социальный процесс
Новый социальный процессEugene Kalinin
 
Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Eugene Kalinin
 
сотрудничество
сотрудничествосотрудничество
сотрудничествоEugene Kalinin
 

Destaque (18)

Something about links
Something about linksSomething about links
Something about links
 
Programma Congresso Cndec 2012
Programma Congresso  Cndec 2012Programma Congresso  Cndec 2012
Programma Congresso Cndec 2012
 
Data Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. VisionData Mining GenBank for Phylogenetic inference - T. Vision
Data Mining GenBank for Phylogenetic inference - T. Vision
 
Phyloinformatics: Introduction
Phyloinformatics: IntroductionPhyloinformatics: Introduction
Phyloinformatics: Introduction
 
RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009RTFM. Мастер-класс про бизнес. RootConf-2009
RTFM. Мастер-класс про бизнес. RootConf-2009
 
Ответственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесеОтветственность за факапы в сервисном бизнесе
Ответственность за факапы в сервисном бизнесе
 
Etot about startup
Etot about startupEtot about startup
Etot about startup
 
Making data sticky
Making data stickyMaking data sticky
Making data sticky
 
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
Phyloinformatics in the age of Wikipedia (warning, do not view if easily offe...
 
Трекшн карта
Трекшн картаТрекшн карта
Трекшн карта
 
Менторская программа Startup Magic
Менторская программа Startup MagicМенторская программа Startup Magic
Менторская программа Startup Magic
 
Трекшн карта и проблемное интервью
Трекшн карта и проблемное интервьюТрекшн карта и проблемное интервью
Трекшн карта и проблемное интервью
 
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в УльяновскеБизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
Бизнес-план за 60 минут. Презентация на стартап-школе в Ульяновске
 
Новый социальный процесс
Новый социальный процессНовый социальный процесс
Новый социальный процесс
 
Репутация
РепутацияРепутация
Репутация
 
Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1Новый социальный процесс, v.1.1
Новый социальный процесс, v.1.1
 
сотрудничество
сотрудничествосотрудничество
сотрудничество
 
Parliamo di SOA
Parliamo di SOAParliamo di SOA
Parliamo di SOA
 

Semelhante a Working with Trees in the Phyloinformatic Age. WH Piel

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaRoderic Page
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
Trees in data structrures
Trees in data structruresTrees in data structrures
Trees in data structruresGaurav Sharma
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryRoderic Page
 
Lecture notes data structures tree
Lecture notes data structures   treeLecture notes data structures   tree
Lecture notes data structures treemaamir farooq
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1Satyam Sonker
 
Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎2t3
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointWorks Applications
 
Admissions in india 2015
Admissions in india 2015Admissions in india 2015
Admissions in india 2015Edhole.com
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
Trees - Non Linear Data Structure
Trees - Non Linear Data StructureTrees - Non Linear Data Structure
Trees - Non Linear Data StructurePriyanka Rana
 

Semelhante a Working with Trees in the Phyloinformatic Age. WH Piel (20)

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Tree and Binary Search tree
Tree and Binary Search treeTree and Binary Search tree
Tree and Binary Search tree
 
Trees in data structrures
Trees in data structruresTrees in data structrures
Trees in data structrures
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V Berry
 
Lecture notes data structures tree
Lecture notes data structures   treeLecture notes data structures   tree
Lecture notes data structures tree
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Cg7 trees
Cg7 treesCg7 trees
Cg7 trees
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
Unit 4.1 (tree)
Unit 4.1 (tree)Unit 4.1 (tree)
Unit 4.1 (tree)
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
 
Module 8.1 Trees.pdf
Module 8.1 Trees.pdfModule 8.1 Trees.pdf
Module 8.1 Trees.pdf
 
Phylogenetic analyses1
Phylogenetic analyses1Phylogenetic analyses1
Phylogenetic analyses1
 
Data Structures
Data StructuresData Structures
Data Structures
 
Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎Cassandraに不向きなcassandraデータモデリング基礎
Cassandraに不向きなcassandraデータモデリング基礎
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
 
Admissions in india 2015
Admissions in india 2015Admissions in india 2015
Admissions in india 2015
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Trees - Non Linear Data Structure
Trees - Non Linear Data StructureTrees - Non Linear Data Structure
Trees - Non Linear Data Structure
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 

Mais de Roderic Page

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)Roderic Page
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphRoderic Page
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaRoderic Page
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphRoderic Page
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsRoderic Page
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graphRoderic Page
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talkRoderic Page
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataRoderic Page
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyondRoderic Page
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital CatapultRoderic Page
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stRoderic Page
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge GraphsRoderic Page
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal viewRoderic Page
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldRoderic Page
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Roderic Page
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaRoderic Page
 

Mais de Roderic Page (20)

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge Graph
 
BioStor Next
BioStor NextBioStor Next
BioStor Next
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living Australia
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graph
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talk
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long data
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyond
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital Catapult
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21st
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge Graphs
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal view
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living world
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, India
 

Último

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 

Último (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Working with Trees in the Phyloinformatic Age. WH Piel

  • 1. Working with Trees in the Phyloinformatic Age William H. Piel Yale Peabody Museum Hilmar Lapp NESCent, Duke University
  • 2.
  • 3.
  • 4. Dewey system: A B C D E 0.1 0.1.1 0.1.2 0.2 0.2.1 0.2.1.1 0.2.1.2 0.2.2 0
  • 5. Find clade for: Z = (<C S +D s ) Find common pattern starting from left SELECT * FROM nodes WHERE (path LIKE “0.2.1%”); 0.2.2 E 0.2.1.2 D 0.2.1.1 C 0.2.1 NULL 0.2 NULL 0.1.2 B 0.1.1 A 0.1 NULL 0 Root Path Label A B C D E
  • 6.
  • 7.
  • 8. Depth-first traversal scoring each node with a lef and right ID A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
  • 9. SELECT * FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Minimum Spanning Clade of Node 5 16 15 E 13 12 D 11 10 C 14 9 17 8 6 5 B 4 3 A 7 2 18 1 Right Left Label A B C D E 2 3 5 8 9 10 12 15 1 4 6 7 17 11 13 16 18 14
  • 10.
  • 11.
  • 12. A B C D E 1 2 3 4 5 6 7 8 9 - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6
  • 13. SQL Query to find parent node of node “D”: SELECT * FROM nodes AS parent INNER JOIN nodes AS child ON (child.parent_id = parent.node_id) WHERE child.node_label = ‘D’; … but this requires an external procedure to navigate the tree. - 1 - - 2 1 A 3 2 B 4 2 - 6 5 - 5 1 C 7 6 E 9 5 D 8 6 node_label: node_id: parent_id:
  • 14.
  • 15. Searching trees by distance metrics: USim distance Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel. 2005. Fast Structural Search in Phylogenetic Databases. Evolutionary Bioinformatics Online , 1: 37-46 A B C D A B C D 0 1 1 1 D 2 0 1 1 C 3 2 0 1 B 3 2 1 0 A D C B A 0 1 2 2 D 1 0 2 2 C 2 2 0 1 B 2 2 1 0 A D C B A
  • 16.
  • 17.
  • 18.
  • 19. BioSQL: http://www.biosql.org/ Schema for persistent storage of sequences and features tightly integrated with BioPerl (+ BioPython, BioJava, and BioRuby) • phylodb extension designed at NESCent Hackathon • perl command-line interface by Jamie Estill, GSoC
  • 20. CREATE TABLE node_path ( child_node_id integer, parent_node_id integer, distance integer); Index of all paths from ancestors to descendants A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 21. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B'; Find all paths where A and B share a common parent_node_id A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 22. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B' ORDER BY pA.distance LIMIT 1; … of those paths, select one that has the shortest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 23. SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B' ORDER BY pA.distance DESC LIMIT 1; … of those paths, select one that has the longest path A B 1 2 3 4 5 C 1 2 1 5 3 2 4 2 1 3 1 4
  • 24. SELECT e.parent_id AS parent, e.child_id AS child, ch.node_label, pt.tree_id FROM node_path p, edges e, nodes pt, nodes ch WHERE e.child_id = p.child_node_id AND pt.node_id = e.parent_id AND ch.node_id = e.child_id AND p.parent_node_id IN (       SELECT pA.parent_node_id       FROM   node_path pA, node_path pB, nodes nA, nodes nB       WHERE pA.parent_node_id = pB.parent_node_id       AND   pA.child_node_id = nA.node_id       AND   nA.node_label = 'A'       AND   pB.child_node_id = nB.node_id       AND   nB.node_label = 'B') AND NOT EXISTS (     SELECT 1 FROM node_path np, nodes n     WHERE    np.child_node_id = n.node_id     AND n.node_label  = 'C'     AND np.parent_node_id = p.parent_node_id); Find the maximum spanning clade (i.e. the subtree) for each tree that includes A and B but not C: Get all ancestors shared by A and B Exclude those that are also ancestors to C Return an adjacency list for each subtree
  • 25. SELECT DISTINCT t.tree_id, t.name FROM node_path p, nodes ch, trees t WHERE ch.node_id = p.child_node_id AND ch.tree_id = t.tree_id AND p.parent_node_id IN ( SELECT pA.parent_node_id FROM node_path pA, node_path pB, nodes nA, nodes nB WHERE pA.parent_node_id = pB.parent_node_id AND pA.child_node_id = nA.node_id AND nA.node_label = 'A' AND pB.child_node_id = nB.node_id AND nB.node_label = 'B') AND NOT EXISTS ( SELECT 1 FROM node_path np, nodes n WHERE np.child_node_id = n.node_id AND n.node_label = 'C' AND np.parent_node_id = p.parent_node_id); Find trees that contain a clade that includes A and B but not C: Get all ancestors shared by A and B Exclude those that are also ancestors to C List the set of trees with these ancestors
  • 26. SELECT qry.tree_id, MIN(qry.name) AS &quot;tree_name&quot; FROM ( SELECT DISTINCT ON (n.node_id) n.node_id, t.tree_id, t.name FROM trees t, nodes n, (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca, WHERE n.node_id IN (lca2.parent_node_id) AND t.tree_id = n.tree_id AND NOT EXISTS (SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS (SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2)) AS qry GROUP BY (qry.tree_id) HAVING COUNT(qry.node_id) = 1; Find trees that contain a clade that includes (A, B, C) but not D or E: Get all ancestors of A, B, C from all trees that have A, B, C Exclude those that are also ancestors to D, E But make sure that the tree still contains D, E Number of clades that each tree must satisfy Number of ingroups that share node Number of non-ingroups that must be in tree
  • 27. SELECT t.tree_id, t.name FROM trees t INNER JOIN (SELECT DISTINCT ON (inN.tree_id) inP.parent_node_id, inN.tree_id FROM nodes inN, node_path inP WHERE inN.node_label IN ('A','B','C') AND inP.child_node_id = inN.node_id GROUP BY inN.tree_id, inP.parent_node_id HAVING COUNT(inP.child_node_id) = 3 ORDER BY inN.tree_id, inP.parent_node_id DESC) AS lca USING (tree_id) WHERE NOT EXISTS ( SELECT 1 FROM nodes outN, node_path outP WHERE outN.node_label IN ('D','E') AND outP.child_node_id = outN.node_id AND outP.parent_node_id = lca.parent_node_id) AND EXISTS ( SELECT c.tree_id FROM trees c, nodes q WHERE q.node_label IN ('D','E') AND q.tree_id = c.tree_id AND c.tree_id = t.tree_id GROUP BY c.tree_id HAVING COUNT(c.tree_id) = 2); Here's a faster, cleaner version:
  • 28. Matching a whole tree means querying for all clades (A, B) but not C, D, E (C, D) but not A, B, E (C, D, E) but not A, B A B C D E 1 2 3 4 5 6 7 8 9
  • 29.
  • 30. (((Sus_scrofa, Hippopotamus),Balaenoptera),Equus_caballus) vs ((Sus_scrofa, (Hippopotamus,Balaenoptera)),Equus_caballus) Mining trees for interesting, general, relationship questions: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Balaenoptera Hippopotamus Sus scrofa Equus caballus Felis catus
  • 31. Even if with perfectly-resolved OTUs, you will still fail to hit relevant trees: Sus scrofa Hippopotamus Balaenoptera Equus caballus Felis catus Sus celebensis Hippopotamus Balaenoptera Equus asinus Felis catus
  • 32. Step 1: for each clade all trees in database, run a stem query on a classification tree (e.g. NCBI) Stem Queries: Node 2: (>A, B - C, D, E) Node 3: (>A - B, C, D, E) Node 4: (>B - A, C, D, E) Node 5: (>C, D, E - A, B) Node 6: (>C, D - A, B, E) Node 7: (>C - A, B, D, E) Node 8: (>D - A, B, C, E) Node 9: (>E - A, B, C, D) Step 2: label each node with an NCBI taxon id (if there is a match) Step 3: do the same for the query tree A B C D E 1 2 3 4 5 6 7 8 9
  • 33. Rename nodes according to their deepest stem query… Gorilla gorilla Homo sapiens Pan troglodytes Macaca sinica Macaca nigra Hominoidea Cercopithecoidea Gorilla Homo Pan Macaca sinica Macaca nigra Pongo pygmaeus Macaca irus Hominoidea Cercopithecoidea
  • 34.
  • 35.