O slideshow foi denunciado.

Euler: A Logic­‐Based Toolkit for Aligning & Reconciling Multiple Taxonomic Perspectives

1

Compartilhar

Carregando em…3
×
12 de 66
12 de 66

Euler: A Logic­‐Based Toolkit for Aligning & Reconciling Multiple Taxonomic Perspectives

1

Compartilhar

Baixar para ler offline

CIRSS (Center for Informatics Research in Science and Scholarship) Seminar talk given on Sept. 19, 2014 at GSLIS, UIUC.

http://cirssweb.lis.illinois.edu/Events/eventDetails.php?id=214

CIRSS (Center for Informatics Research in Science and Scholarship) Seminar talk given on Sept. 19, 2014 at GSLIS, UIUC.

http://cirssweb.lis.illinois.edu/Events/eventDetails.php?id=214

Mais Conteúdo rRelacionado

Mais de Bertram Ludäscher

Audiolivros relacionados

Gratuito durante 14 dias do Scribd

Ver tudo

Euler: A Logic­‐Based Toolkit for Aligning & Reconciling Multiple Taxonomic Perspectives

  1. 1. Euler: A Logic-­‐Based Toolkit for Aligning and Reconciling Mul:ple Taxonomic Perspec:ves Mingmin Chen1 Shizhuo Yu1 Parisa Kianmajd1 Nico Franz2 Shawn Bowers3 Bertram Ludäscher 4 1 Dept. of Computer Science , University of California, Davis 2 School of Life Sciences, Arizona State University 3 Dept. of Computer Science, Gonzaga University 4 GSLIS & NCSA, University of Illinois at Urbana-­‐Champaign
  2. 2. Outline • Meet Nico, Curator of Insects • TAP: The Taxonomy Alignment Problem • Euler/X – Logic Inside! (X in FOL, RCC, ASP) • Related Projects B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 2
  3. 3. Meet Prof. Nico Franz: Curator of Insects @ ASU B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 3
  4. 4. What Nico et al. do for a living … B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 4
  5. 5. Use Case: Perelleschus sec. 2001 & 2006 Perelleschus salpinflexus sec. Franz & Cardona-­‐Duque (2013) DOI:10.1080/14772000.2013.806371 1 Input ar:cula:ons: Franz & Cardona-­‐Duque. 2013. Descripaon of two new species and phylogeneac reassessment of Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-­‐Duque, 2013. 2013. Systema5cs and Biodiversity 11: 209–236. Merge analyses: Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in press) 1 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 5
  6. 6. T1: Goal: Align two phylogenies with differen:al taxon sampling Perelleschus sec. 2001 • Phylogeneac revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total Source: Nico Franz. Explaining taxonomy's legacy to computers – how and why? Naming Diversity in the 21st Natural History, U of Colorado, 9/30/2014. T2: The Meaning of Names: Century, Museum of Perelleschus sec. 2006 • Exemplar analysis • 2 ingroup species concepts • 1 outgroup concept • 7 concepts total B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 6
  7. 7. What Nico does for a living (cont’d): The Indoors Part • Go fun places, find new bugs, study them … – “Bugs-­‐R-­‐Us” (see taxonbytes.org) • Now: Compare, align and revise taxonomies, based on careful observaaon, “character” data, experase … • Formally: – Input: T1 + T2 (taxonomies) + A (expert ar3cula3ons) – Output: revised, “merged” taxonomy (-­‐ies) T3 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 7
  8. 8. Taxonomy Alignment Problem (TAP) T1 • Given: T2 – Taxonomies T1 , T2 • incl. constraints (coverage, disjointness) – Set of articulations (an alignment) A • Find: – Combined (“merged”) taxonomy T3 (= T1 + T2 + A) • Is it a taxonomy? Or a DAG? – Optional: • Final alignment (should be minimal) T3 A B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 8
  9. 9. Real Example: Turn this … 1.16 1.17 1.20 2.40 < OR == 1.18 1.19 2.41 == 1.14 1.15 2.36 ! 2.38 < OR == 2.39 == 1.12 1.13 1.12L ! 2.37 == 1.11 2.42 == 2.43 == 1.27 == 2.50 1.23 1.24 1.25 2.53 > OR ! 2.52 > OR ! 2.47 < OR == 2.54 > OR ! 1.22 2.46 == 1.21 2.45 == 2.44 < OR == 1.26 == 2.49 2.48 == 2.51 == 2.35 2.36L Nodes 1 18 2 21 Edges isa_1 17 isa_2 20 Art. 20 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 9
  10. 10. … into this! (Perellescus Alignment Result) 2.44 1.14 2.40 1.11 2.38 • T3 := T1 and T2 are “merged” 2.47 1.16 2.52 1.22 2.46 – Blue dashed: overlaps è resolve via “zoom-in view” 2.35 1.20 1.23 2.53 2.54 1.17 2.41 1.25 2.48 1.12 2.36 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.21 2.45 1.12L 2.36L 1.27 2.50 1.24 2.51 Nodes Taxonomy 1 5 Taxonomy 2 8 MERGED Taxa 13 Edges Overlaps 10 Input 24 INFERRED 5 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 10
  11. 11. So how does it work? • If you have 3 concepts A, B, and C. • Assume you know something about – A óR1 B (e.g. R1: A is a subset of B) – B óR2 C (e.g., R2: B is disjoint from C) • Now what can you say about this: – A óR3 C • Yes ?? • … it follows that R3: A is disjoint from C! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 11
  12. 12. Ar:cula:on Language (RCC-­‐5) • How does the expert express the known (or assumed) relaaonship between taxa A and B? • How can A and B be related? • Use basic set constraints (B5): – A = B (equals EQ) (==) – A < B (proper part of PP) (<) – A > B (inverse proper part of IPP) (>) – A o B (paraally overlaps PO) (><) – A ! B (disjoint “region” DR) (!) B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 12
  13. 13. Taxonomies and Ar:cula:ons in Euler There are 32 (= 25) possible disjunc:ons for represenang par:al informa:on. A taxonomy T is a triple (N, ≼, ϕ) with names (taxa) N, a paraal order (is-­‐a) ≼, and taxonomic constraints ϕ. • Sibling Disjointness: sibling taxa do not overlap • (Parent) Coverage: The union of the children “covers” the parent è no “missing” children A B (iv) par5al overlap A B (ii) proper part B A (iii) Inverse proper part A B (i) congruence A B (v) disjointness An ar:cula:on is a relaaon (set-­‐constraint) between taxa A and B. One, and only one, of the following base relaaons B5 must hold: B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 13
  14. 14. R32 lahce of 32 (=25) disjunc:ons over B5 = < > o ! (TRUE) Level 5 (tautology) = < o ! = < > o = > o ! = < > ! < > o ! Level 4 = < o = o ! = < ! < o ! = > o = < > = > ! < > o > o ! < > ! = o = < = ! < o o ! < ! = > > o < > > ! = o < ! > ∅ (FALSE) = EQ(x,y) Equals < PP(x,y) Proper Part of > iPP(x,y) Inverse Proper Part o PO(x,y) Partially Overlaps ! DR(x,y) Disjoint from Level 1 (BASE-5 relations) Level 3 Level 2 Level 0 (contradiction) B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 14
  15. 15. • … Aristotle … • … Euler … • … • … Greg Whitbread … • [BPB93] J. H. Beach, S. Pramanik, and J. H. Beaman. Hierarchic taxonomic databases.,Advances in Computer Methods for Systematic Biology: Artificial Intelligence, Databases, Computer Vision, 1993 • [Ber95] Walter G. Berendsohn. The concept of “potential taxa” in databases. Taxon, 44:207–212, 1995. • [Ber03] Walter G. Berendsohn. MoReTax – Handling Factual Information Linked to Taxonomic Concepts in Biology. No. 39 in Schriftenreihe für Vegetationskunde. Bundesamt für Naturschutz, 2003. • [GG03] M. Geoffroy and A. Güntsch. Assembling and navigating the potential taxon graph. In [Ber03], pages 71–82, 2003. • [TL07] Thau, D., & Ludäscher, B. (2007). Reasoning about taxonomies in first-order logic. Ecological Informatics, 2(3), 195-209. • [FP09] Franz, N. M., & Peet, R. K. (2009). Perspectives: towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity, 7(1), 5-20. • … 15 Some History B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014
  16. 16. What’s in a name? Euler Diagrams • Project named after Euler Diagrams: IF A is-a B AND C and B are disjoint ------------------------------------ THEN: A and C are disjoint! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 16
  17. 17. Euler Diagrams asTrees (or Graphs) A containment hierarchy (taxonomy) An equivalent graph (w/ transi5ve edges) same informa:on B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 17
  18. 18. Represent Phylogenies as Trees … T1: Perelleschus sec. 2001 • Phylogeneac revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total 1.16 1.17 1.20 1.18 1.19 1.14 1.15 1.12 1.13 1.12L 1.11 1.27 1.23 1.25 1.24 1.22 1.21 1.26 2.37 2.42 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 18
  19. 19. … for all taxonomies of interest … 1.16 1.17 1.20 1.18 1.19 1.14 1.15 1.12 1.13 1.12L 1.11 1.27 1.23 1.25 1.24 1.22 1.21 1.26 2.41 2.42 2.43 2.35 2.36 2.38 2.37 2.36L 2.39 2.40 2.53 2.45 2.46 2.47 2.52 2.54 2.44 2.51 2.50 2.48 2.49 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 19
  20. 20. … ready, rotate by 90o, set … 1.16 1.17 1.20 1.18 1.19 1.14 1.15 1.12 1.13 1.12L 1.11 1.27 1.23 1.25 1.24 1.22 1.21 1.26 2.41 2.40 2.49 2.54 2.46 2.45 2.35 2.37 2.36 2.39 2.38 2.53 2.52 2.47 2.51 2.50 2.48 2.43 2.42 2.44 2.36L B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 20
  21. 21. Go! An expert input alignment! Just add some Euler Reasoning … 1.16 1.17 1.20 2.40 < OR == 1.18 1.19 2.41 == 1.14 1.15 2.36 ! 2.38 < OR == 2.39 == 1.12 1.13 1.12L ! 2.37 == 1.11 2.42 == 2.43 == 1.27 == 2.50 1.23 1.24 1.25 2.53 > OR ! 2.52 > OR ! 2.47 < OR == 2.54 > OR ! 1.22 2.46 == 1.21 2.45 == 2.44 < OR == 1.26 == 2.49 2.48 == 2.51 == 2.35 2.36L Nodes 1 18 2 21 Edges isa_1 17 isa_2 20 Art. 20 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 21
  22. 22. Euler/X toolkit in a single screenshot (desktop version, IX-­‐2014)
  23. 23. … et voilà! The merged T3 (=T1 & T2 & A) 2.52 1.23 2.53 2.54 1.22 2.46 1.25 2.48 1.26 2.49 1.21 2.45 1.18 2.42 1.19 2.43 1.27 2.50 1.24 2.51 The Euler reasoner(s) infer: -­‐ Grey: “perfect match” (congruences) -­‐ Green, Yellow: “keepers” from T1, T2 -­‐ Red edges: deduced subset/“sub-­‐class”relaaons -­‐ Blue edges: deduced overlaps 2.44 1.16 2.40 1.14 2.47 2.38 1.11 2.35 1.20 1.17 2.41 1.12 2.36 1.15 2.39 1.13 2.37 1.12L 2.36L B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 23
  24. 24. 1.16 But wait: PW1 … 2.40 1.14 2.44 2.47 2.38 1.11 1.12 2.35 2.36 1.12L 2.36L 1.20 2.52 1.23 2.53 2.54 1.17 2.41 1.25 2.48 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.22 2.46 1.21 2.45 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 24
  25. 25. 1.16 2.40 1.14 2.44 2.47 … PW2 2.38 1.11 1.12 2.35 1.12L 2.36 1.20 2.52 1.23 2.53 2.54 1.17 2.41 2.36L 1.25 2.48 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.22 2.46 1.21 2.45 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 25
  26. 26. 1.16 2.40 1.14 2.44 2.47 … PW3 2.38 1.11 2.36 1.12 2.36L 2.35 1.12L 1.20 2.52 1.23 2.53 2.54 1.17 2.41 1.25 2.48 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.22 2.46 1.21 2.45 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 26
  27. 27. 1.16 2.40 1.14 2.44 2.47 … PW4 2.38 1.11 2.35 1.20 2.52 1.23 2.53 2.54 1.17 2.41 1.22 2.46 1.25 2.48 1.12 2.36 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.21 2.45 1.12L 2.36L 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 27
  28. 28. 1.16 1.14 2.40 2.44 2.47 … PW5 1.11 2.38 1.12 2.35 2.36 1.12L 1.20 1.23 2.52 2.53 2.54 2.36L 1.17 2.41 1.25 2.48 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.22 2.46 1.21 2.45 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 28
  29. 29. Hmmm… depending on input alignment: PW1 1.16 2.40 1.14 2.44 2.47 2.38 1.11 2.36 1.12 2.36L 2.35 1.12L 1.20 2.52 1.23 2.53 2.54 1.17 2.41 1.25 2.48 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.22 2.46 1.21 2.45 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 29
  30. 30. … and PW2 are the only solu:ons! 1.16 What happened? 2.40 1.14 2.44 2.47 2.38 1.11 2.35 1.20 2.52 1.23 2.53 2.54 1.17 2.41 1.22 2.46 1.25 2.48 1.12 2.36 1.26 2.49 1.13 2.37 1.18 2.42 1.19 2.43 1.15 2.39 1.21 2.45 1.12L 2.36L 1.27 2.50 1.24 2.51 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 30
  31. 31. TAP: Possible Outcomes 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 31
  32. 32. TAP: Possible Outcomes 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment {A1, A2, A3, A4} Black-­‐Box Provenance {A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4} {A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4} {A1} {A2} {A3} {A4} { } Inconsistent! è Diagnosis (Reiter) = B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 32
  33. 33. TAP: Possible Outcomes 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment {A1, A2, A3, A4} Black-­‐Box Provenance {A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4} {A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4} {A1} {A2} {A3} {A4} { } Inconsistent! è Diagnosis (Reiter) = 2.e 1.b 1.c 1.a 2.d 2.f Ambiguous! è Mul5ple Possible Worlds B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 33
  34. 34. TAP: Possible Outcomes 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment {A1, A2, A3, A4} Black-­‐Box Provenance {A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4} {A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4} {A1} {A2} {A3} {A4} { } Inconsistent! è Diagnosis (Reiter) = 2.e 1.b 1.c 1.a 2.d 2.f Ambiguous! è Mul5ple Possible Worlds 2.f 1.c 1.b 1.a 2.d 2.e B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 34
  35. 35. TAP: Possible Outcomes 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment {A1, A2, A3, A4} Black-­‐Box Provenance {A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4} {A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4} {A1} {A2} {A3} {A4} { } Inconsistent! è Diagnosis (Reiter) = 2.e 1.b 1.c 1.a 2.d 2.f Ambiguous! è Mul5ple Possible Worlds 2.f 1.c 1.b 1.a 2.d 2.e 1.b 2.e 1.a 2.d 1.c 2.f B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 35
  36. 36. Euler/X Toolkit and Workflow • FO reasoning about taxonomies (MFOL) • Earlier: CleanTax – Prover9/Mace4 • Now: Euler – ASP Reasoners (DLV, Clingo) – Specialized reasoners (PyRCC) – … – X = ASP, RCC, … B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 36
  37. 37. Reducing Ambiguity Possible Worlds (PWs) View Aggregate View (AV) Cluster View (CV) Explore! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 37
  38. 38. Common Outcome: Inconsistency! 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment {A1, A2, A3, A4} Black-­‐Box Provenance {A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4} {A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4} {A1} {A2} {A3} {A4} { } Inconsistent! è Diagnosis (Reiter) = • Need to debug the input araculaaons è (black-­‐box) diagnosis! • Focus: – How do we efficiently compute the diagnosac lauce? • Also: – How to visualize.. B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 38
  39. 39. A Hybrid Diagnosis Approach Combining Black-­‐Box and White-­‐Box Reasoning Mingmin Chen1 Shizhuo Yu1 Nico Franz2 Shawn Bowers3 Bertram Ludäscher 4 1 Department of Computer Science , University of California, Davis 2 School of Life Sciences, Arizona State University 3 Department of Computer Science, Gonzaga University 4 GSLIS & NCSA, University of Illinois at Urbana-­‐Champaign
  40. 40. Example Instance (from syntheac benchmark suite) • Here: N = 10 taxa in T1, T2 • Euler/X finds: inconsistent! • è diagnos:c lahce of 210 = 1024 nodes è Find minimal inconsistent subset (MIS) è maximal consistent subset (MCS) .. è show to user! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 40
  41. 41. Visualizing Diagnoses N = 10 araculaaons è 210 = 1024 node diagnosac lauce B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 41
  42. 42. Bener Idea: Just show MIS, MCS N = 4 araculaaons è 24 = 16 node diagnosac lauce, but 3 MCS and 2 MIS are enough! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 42
  43. 43. Visualizing Diagnoses .. but 4 MCS and 1 MIC tell it all! 1024 node lauce B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 43
  44. 44. Visualizing Diagnoses Example from RuleML’14 paper: N=12 è 4096 nodes .. but 7 MCS and 5 MIC tell it all! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 44
  45. 45. Black-Box Inconsistency Analysis (Diagnostic Lattice) • Then: What happens if you can’t have all (here: 4) articulations together? – Repair: find & revise minimal inconsistent subsets (Min-Incons) – Expand: find maximal consistent subsets (Max-Cons) & revise outs B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 45
  46. 46. Inconsistency Analysis (Diagnostic Lattice) • Black-­‐box Analysis (Hiung Set algo.) yields a • The Min-Incons (MIS) and Max-Cons (MCS) sets determine all others è Repair MIS and/or Expand MCS Diagnosis (lauce) – for n=4 araculaaons, there are 168 possible diagnoses – depending on expected “red/green areas” è explore space differently • |araculaaons| = n è |possible diagnoses| = |monotonic Boolean funcaons| = Dedekind Number (n): 2, 3, 6, 20, 168, 7581, 7828354, ... B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 46
  47. 47. Improving Diagnosis • Reiter’s “black-­‐box” (model-­‐based) diagnosis helps debug the araculaaons • Limited scalability (inherent complexity) • But every bit helps: – Hiung Set Algorithm (“logarithmic extracaon”) • Our idea: – Exploit “white-­‐box” reasoning informaaon è RULES to the rescue B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 47
  48. 48. Key Idea: exploit white-­‐box info • We use Answer Set Programming (ASP) to solve Taxonomy Alignment Problem (TAP) • Inconsistency = “False” is derived in the head: False :-­‐ <denial of integrity constraint> • Apply provenance trick from databases J – What araculaaons contribute to a derivaaon of “False” ? – Eliminate those that don’t! è an example of reusing inferences across separate black-­‐ box tests! B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 48
  49. 49. The Provenance “Trick” B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 49
  50. 50. Hybrid Provenance A3: c < f Black-­‐box Provenance 1.a 1.b isa 1.c isa 2.d < 2.f < 2.e = < isa isa Input Alignment B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 50
  51. 51. Hybrid Provenance A3: c < f Black-­‐box Provenance r7: d = e ∪ f < 2.f < 2.e A1: a = d a = e ∪ f A1+A2 + … => r3: a = b ∪ c r4: b ∩ c = ∅ r8: e ∩ f = ∅ A2: b < e f < c f < c White-­‐box Provenance 1.a 1.b isa 1.c isa 2.d = < isa isa Input Alignment B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 51
  52. 52. The Hybrid Approach B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 52
  53. 53. Hybrid Approach What ar5cula5ons contribute to some inconsistency? Good old black-­‐box (HST) B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 53
  54. 54. Benchmark Results • White-­‐box < Hybrid < Black-­‐box (runames) • Note: white-­‐box does not give you a diagnosis • Potassco < DLV B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 54
  55. 55. Benchmark DLV • White-­‐box < Hybrid < Black-­‐box (runames) • Potassco < DLV B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 55
  56. 56. Benchmark Clingo • White-­‐box < Hybrid < Black-­‐box (runames) • Potassco < DLV B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 56
  57. 57. Summary: Hybrid Diagnosis • ASP rules can be used to efficiently solve real-­‐ world taxonomy reasoning problems • Reiter’s diagnosis useful to debug inconsistent alignments • Adding a “white-­‐box” provenance approach speeds up state-­‐of-­‐the-­‐art HST algorithm by elimina:ng independent ar:cula:ons • Future work: – Further improvements, including parallelism: • Trade-­‐off with sharing inferences across parallel instances B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 57
  58. 58. Related Projects B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 58
  59. 59. The Data Life Cycle B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 59
  60. 60. Data Quality & Curation Workflows • Collections & occurrence data is all over the map – … literally (off the map!) • Issues: – Lat/Long transposition, coordinate & projection issues – Data entry/creation, “fuzzy” data, naming issues, bit rot, data conversions and transformations, schema mappings, … (you name it) • Filtered-Push Collaboration B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 60
  61. 61. B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 61
  62. 62. Filtered-Push: Kurator (Data Curation Workflows) Tianhong Song Sven Köhler Lei Dou (former member) B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 62
  63. 63. From Tool Users to Tool Makers Screen capture… back to the original definition B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 63
  64. 64. Theory meets Prac:ce B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 64
  65. 65. Under the hood: Logic (ASP) B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 65
  66. 66. Summary & Invita:on • Building open source tools for – Euler: Reasoning about taxonomies (& data integraaon) – Kurator: Data Curaaon workflows • … and other scienafic workflows • Topic not covered: – (Game) Theory of Provenance (DAIS talk @CS, 10/7/2014) • Looking for: – new collaborators, students, .. • Let’s meet! – ludaesch@illinois.edu B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 66

×