SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
OQGRAPH
   Graphs and Heirarchies in Plain SQL




Antony T Curtis <atcurtis@gmail.com>


                      graph@openquery.com
                      http://openquery.com/graph
Hierarchies / Trees

      ● Trees typically have a single "root" node.
      ● All child nodes have only one parent.




   Other examples:
    ● Menu structures.
    ● Organisation charts.
    ● Filesystem directories.

OQGRAPH computation engine © 2009-2011 Open Query
Graphs / Networks
      ● Nodes connected by Edges.
      ● Edges may be directional.
      ● Edges may have a "weight" / "cost" attribute.
      ● Directed graphs may have bi-directional edges.
      ● Unconnected sets of nodes may exist on same graph.
      ● There need not be a "root" node.




   Examples:
    ● "Social Graphs" / friend relationships.
    ● Decision / State graphs.
    ● Airline routes
OQGRAPH computation engine © 2009-2011 Open Query
Problem Solving

                           Trees                             Networks

      ● Does Dilbert report to the                  ● What is the quickest air
        PHB?                                         route to MLA from SJC?

      ● How many people report                      ● What is the shortest path
        to manager X?                                 of decisions to get to state
                                                      #11 from state #5.
      ● How many people are
        between the CEO and                         ● Playing "Six Degrees of
        employee Y?                                  Kevin Bacon"



OQGRAPH computation engine © 2009-2011 Open Query
RDBMS with Heirarchies and Graphs

      ● Not always a particularly good fit.
      ● Various tree models exist; each with limitations:
         ○ Adjacency model
             ■ Either uses fixed max depth or recursive queries.
             ■ Oracle has CONNECT BY PRIOR
             ■ SQL99 has WITH RECURSIVE...UNION...
         ○ Nested set
             ■ complex
             ■ recursive queries to find path to root.
         ○ Materialised path
             ■ Ugly and not relational.
             ■ Can be quite effective when used correctly.

                                              Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

OQGRAPH computation engine © 2009-2011 Open Query
What is OQGRAPH?

      ● Implemented as a storage engine.
          ○ Original concept by Arjen Lentz
          ○ for MySQL
          ○ for Drizzle
          ○ for MariaDB
      ● Mk. II implementation by
          ○ Antony Curtis
          ○ Arjen Lentz @openquery
      ● Mk. III dev. on LaunchPad
      ● Licensing
          ○ GPLv2+



OQGRAPH computation engine © 2009-2011 Open Query
OQGRAPH: A Computation Engine

      ● It is not a general purpose data engine.
           ○ unlike MyISAM, InnoDB, PBXT or MEMORY.
      ● Looks like an ordinary table.
      ● Has a very different internal architecture.
      ● It does not operate in terms of
           ○ storing data for later retrieval.
           ○ having indexes on data.

      ● May be regarded as a "magic view" or "table function".




OQGRAPH computation engine © 2009-2011 Open Query
Getting OQGRAPH
   MariaDB - available as a plugin.
    ● Included in mainline MariaDB 5.2 builds.
              ○ INSTALL PLUGIN oqgraph SONAME ‘oqgraph_engine’;
    ● Or build it for yourself.
        ○ All MySQL/MariaDB storage engines should be built with
          same debug/compile flags for correct behaviour.
    ● Check with SHOW PLUGINS and SHOW STORAGE ENGINE.
    ● 64bit Windows build is currently unstable.
   MySQL 5.0 does not have plugins so must be compiled in.
    ● Binaries available from ourdelta.org
    ● Included in '-sail' builds since 5.0.87-d10
              ○ SHOW GLOBAL VARIABLES LIKE 'have_oqgraph';
   Drizzle
    ● Basic port has been done.


OQGRAPH computation engine © 2009-2011 Open Query
Anatomy of an OQGRAPH table

   CREATE TABLE db.tblname (
          latch SMALLINT UNSIGNED NULL,
          origid BIGINT UNSIGNED NULL,
          destid BIGINT UNSIGNED NULL,
          weight DOUBLE NULL,
          seq BIGINT UNSIGNED NULL,
          linkid BIGINT UNSIGNED NULL,
          KEY (latch, origid, destid) USING HASH,
          KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH;


               Note: Mk.3 has a few additional options, discussed later.
OQGRAPH computation engine © 2009-2011 Open Query
OQGRAPH Mk.II - Inserting data

      ● Only insert directed edges into its memory store.
      ● Edge weight are optional and default to 1.0
      ● Undirected edges may be represented as two directed
        edges, in opposite directions.


   INSERT INTO foo (origid,destid) VALUES
   (1,2), (2,3), (2,4),
   (4,5), (3,6), (5,6);




OQGRAPH computation engine © 2009-2011 Open Query
Selecting Edges

   SELECT * FROM foo;
   +-------+--------+--------+--------+------+--------+
   | latch | origid | destid | weight | seq | linkid |
   +-------+--------+--------+--------+------+--------+
   | NULL |       1 |      2 |      1 |    0 |   NULL |
   | NULL |       2 |      3 |      1 |    1 |   NULL |
   | NULL |       2 |      4 |      1 |    2 |   NULL |
   | NULL |       4 |      5 |      1 |    3 |   NULL |
   | NULL |       3 |      6 |      1 |    4 |   NULL |
   | NULL |       5 |      6 |      1 |    5 |   NULL |
   +-------+--------+--------+--------+------+--------+




OQGRAPH computation engine © 2009-2011 Open Query
Now, it's time for some magic.
   (shortest path calculation)

       ● SELECT * FROM foo
         WHERE latch=1 AND origid=1 AND destid=6;
         +-------+--------+--------+--------+------+--------+
         | latch | origid | destid | weight | seq | linkid |
         +-------+--------+--------+--------+------+--------+
         |     1 |      1 |      6 |   NULL |     0 |     1 |
         |     1 |      1 |      6 |      1 |     1 |     2 |
         |     1 |      1 |      6 |      1 |     2 |     3 |
         |     1 |      1 |      6 |      1 |     3 |     6 |
         +-------+--------+--------+--------+------+--------+


       ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path
         FROM foo WHERE latch=1 AND origid=1 AND destid=6 G

          path: 1,2,3,6

OQGRAPH computation engine © 2009-2011 Open Query
Other computations,
      ● Which paths lead to node 4?
          SELECT GROUP_CONCAT(linkid) AS list
          FROM foo WHERE latch=1 AND destid=4 G

          list: 1,2,4


      ● Where can I get to from node 4?
          SELECT GROUP_CONCAT(linkid) AS list
          FROM foo WHERE latch=1 AND origid=4 G

          list: 6,5,4




OQGRAPH computation engine © 2009-2011 Open Query
Other computations, continued.

      ● See docs for latch 0 and latch NULL
      ● latch 1 : Dijkstra's shortest path.
          ○ O((V + E).log V)
      ● latch 2 : Breadth-first search.
          ○ O(V+E)
      ● Other algorithms possible




OQGRAPH computation engine © 2009-2011 Open Query
Joins make it prettier,
      ● INSERT INTO people VALUES
        (1,’pearce’), (2,’hunnicut’), (3,’potter’),
        (4,’hoolihan’), (5,’winchester’), (6,’
        mulcahy’);


      ● SELECT GROUP_CONCAT(name ORDER BY seq) path
        FROM foo
        JOIN people ON (foo.linkid = people.id)
        WHERE latch=1 AND origid=1 AND destid=6 G

          path: pearce,hunnicut,potter,mulcahy


OQGRAPH computation engine © 2009-2011 Open Query
In brief: OQGRAPH Mk. II

      ● Behaviour similar to MEMORY engine:
             ○ Table-level locking for normal tables
             ○ No locking for temporary tables
             ○ No persistence
             ○ No transactions
      ● Insert performance O(N.LOG(N))

          This means...
             ○ It’s usable for menus & more, up to say a (few) million edges.
             ○ Inserts get very slow when there are a lot of edges.
             ○ You can use the --init-file option to copy/load on startup.



OQGRAPH computation engine © 2009-2011 Open Query
First Look: OQGRAPH Mk. III

   Features:
    ● Similar core graph implementation.
    ● Uses existing tables as a source for edge data.
    ● Does not impose any strict structure on the donor table.
    ● Efficient Judy sparse bitmaps for node traversal data.

   Notes:
    ● Tables are read-only and only read from the backing table.
    ● Table must be in same schema as the backing table.
    ● Current implementation is not of release quality yet.
    ● But it works!



OQGRAPH computation engine © 2009-2011 Open Query
Tree of Life, with Mk.III
 Load the tol.sql schema,

 Create tol_link backing store table,
 create table tol_link (
    source int unsigned not null,
    target int unsigned not null,
    primary key (source, target),
    key (target) ) engine=innodb;

 Populate it with all the edges we need:
 INSERT INTO tol_link (source,target)
 SELECT parent,id FROM tol WHERE parent IS NOT NULL
 UNION ALL
 SELECT id,parent FROM tol WHERE parent IS NOT NULL;
 Query OK, 178102 rows affected (14.66 sec)



                 Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql

OQGRAPH computation engine © 2009-2011 Open Query
Tree of Life, cont.

  Creating the OQGRAPH MkIII table:
  CREATE TABLE tol_tree (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
  ) ENGINE=OQGRAPH
         data_table='tol_link' origid='source' destid='target';
   select count(*) from tol_treeG
   count(*): 178102
OQGRAPH computation engine © 2009-2011 Open Query
Tree of Life - finding H.Sapiens

   SELECT GROUP_CONCAT(name ORDER BY seq
   SEPARATOR ' -> ') AS path
   FROM tol_tree JOIN tol ON (linkid=id)
   WHERE latch=1 AND origid=1 AND destid=16421 G

   path: Life on Earth -> Eukaryotes -> Unikonts ->
   Opisthokonts -> Animals -> Bilateria -> Deuterostomia ->
   Chordata -> Craniata -> Vertebrata -> Gnathostomata ->
   Teleostomi -> Osteichthyes -> Sarcopterygii -> Terrestrial
   Vertebrates -> Tetrapoda -> Reptiliomorpha -> Amniota ->
   Synapsida -> Eupelycosauria -> Sphenacodontia ->
   Sphenacodontoidea -> Therapsida -> Theriodontia ->
   Cynodontia -> Mammalia -> Eutheria -> Primates ->
   Catarrhini -> Hominidae -> Homo -> Homo sapiens
   1 row in set (2.13 sec)
OQGRAPH computation engine © 2009-2011 Open Query
We want your feedback!!!1one!

      ● Very easy to use...
           But do feel free to ask us for help/advice.

      ● OpenQuery created friendlist_graph for Drupal 6.
             ○ Addition to the existing friendlist module.
             ○ Enables easy social networking in Drupal.
             ○ Peter Lieverdink (@cafuego) did this in about 30 minutes

      ● We would like to know how you are using OQGRAPH!
         ○ You could be doing something really cool...




OQGRAPH computation engine © 2009-2011 Open Query
Links and support
     ● Binaries & Packages
            ○ http://mariadb.com (MariaDB 5.2 & above) < easiest to begin
            ○ http://ourdelta.org (MySQL 5.0)
     ● Source collaboration
            ○ http://launchpad.net/maria (in /storage/oqgraph)
            ○ http://launchpad.net/oqgraph
            ○ Development Mk3 source is currently at https://code.launchpad.
              net/~atcurtis/ourdelta/oqgraph-v3
     ● Info, Docs, Support, Licensing, Engineering
            ○ http://openquery.com/graph
            ○ This presentation: http://goo.gl/UrybZ


                                     Thank you!
                                     Antony Curtis & Arjen Lentz
                                     graph@openquery.com
OQGRAPH computation engine © 2009-2011 Open Query

Mais conteúdo relacionado

Mais procurados

Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internalDavid Lauzon
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySparkSpark Summit
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Dive into Catalyst
Dive into CatalystDive into Catalyst
Dive into CatalystCheng Lian
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMHolden Karau
 
BDM32: AdamCloud Project - Part II
BDM32: AdamCloud Project - Part IIBDM32: AdamCloud Project - Part II
BDM32: AdamCloud Project - Part IIDavid Lauzon
 

Mais procurados (20)

Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Matlab netcdf guide
Matlab netcdf guideMatlab netcdf guide
Matlab netcdf guide
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Apache Spark RDD 101
Apache Spark RDD 101Apache Spark RDD 101
Apache Spark RDD 101
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Dive into Catalyst
Dive into CatalystDive into Catalyst
Dive into Catalyst
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
BDM32: AdamCloud Project - Part II
BDM32: AdamCloud Project - Part IIBDM32: AdamCloud Project - Part II
BDM32: AdamCloud Project - Part II
 

Semelhante a OQGraph at MySQL Users Conference 2011

Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfYunusShaikh49
 
GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015Fabrízio Mello
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101APNIC
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Ali LeClerc
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterAnne Nicolas
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copySemantic Web San Diego
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copyBarbaraStarr2009
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copyBarbaraStarr2009
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageAsankhaya Sharma
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)NerdWalletHQ
 
Challenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleRob Vesse
 

Semelhante a OQGraph at MySQL Users Conference 2011 (20)

Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdf
 
GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015GSoC2014 - Uniritter Presentation May, 2015
GSoC2014 - Uniritter Presentation May, 2015
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copy
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copy
 
2013 april gruff webinar san diego copy
2013 april  gruff webinar   san diego copy2013 april  gruff webinar   san diego copy
2013 april gruff webinar san diego copy
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)Gobblin @ NerdWallet (Nov 2015)
Gobblin @ NerdWallet (Nov 2015)
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Challenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

OQGraph at MySQL Users Conference 2011

  • 1. OQGRAPH Graphs and Heirarchies in Plain SQL Antony T Curtis <atcurtis@gmail.com> graph@openquery.com http://openquery.com/graph
  • 2. Hierarchies / Trees ● Trees typically have a single "root" node. ● All child nodes have only one parent. Other examples: ● Menu structures. ● Organisation charts. ● Filesystem directories. OQGRAPH computation engine © 2009-2011 Open Query
  • 3. Graphs / Networks ● Nodes connected by Edges. ● Edges may be directional. ● Edges may have a "weight" / "cost" attribute. ● Directed graphs may have bi-directional edges. ● Unconnected sets of nodes may exist on same graph. ● There need not be a "root" node. Examples: ● "Social Graphs" / friend relationships. ● Decision / State graphs. ● Airline routes OQGRAPH computation engine © 2009-2011 Open Query
  • 4. Problem Solving Trees Networks ● Does Dilbert report to the ● What is the quickest air PHB? route to MLA from SJC? ● How many people report ● What is the shortest path to manager X? of decisions to get to state #11 from state #5. ● How many people are between the CEO and ● Playing "Six Degrees of employee Y? Kevin Bacon" OQGRAPH computation engine © 2009-2011 Open Query
  • 5. RDBMS with Heirarchies and Graphs ● Not always a particularly good fit. ● Various tree models exist; each with limitations: ○ Adjacency model ■ Either uses fixed max depth or recursive queries. ■ Oracle has CONNECT BY PRIOR ■ SQL99 has WITH RECURSIVE...UNION... ○ Nested set ■ complex ■ recursive queries to find path to root. ○ Materialised path ■ Ugly and not relational. ■ Can be quite effective when used correctly. Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html OQGRAPH computation engine © 2009-2011 Open Query
  • 6. What is OQGRAPH? ● Implemented as a storage engine. ○ Original concept by Arjen Lentz ○ for MySQL ○ for Drizzle ○ for MariaDB ● Mk. II implementation by ○ Antony Curtis ○ Arjen Lentz @openquery ● Mk. III dev. on LaunchPad ● Licensing ○ GPLv2+ OQGRAPH computation engine © 2009-2011 Open Query
  • 7. OQGRAPH: A Computation Engine ● It is not a general purpose data engine. ○ unlike MyISAM, InnoDB, PBXT or MEMORY. ● Looks like an ordinary table. ● Has a very different internal architecture. ● It does not operate in terms of ○ storing data for later retrieval. ○ having indexes on data. ● May be regarded as a "magic view" or "table function". OQGRAPH computation engine © 2009-2011 Open Query
  • 8. Getting OQGRAPH MariaDB - available as a plugin. ● Included in mainline MariaDB 5.2 builds. ○ INSTALL PLUGIN oqgraph SONAME ‘oqgraph_engine’; ● Or build it for yourself. ○ All MySQL/MariaDB storage engines should be built with same debug/compile flags for correct behaviour. ● Check with SHOW PLUGINS and SHOW STORAGE ENGINE. ● 64bit Windows build is currently unstable. MySQL 5.0 does not have plugins so must be compiled in. ● Binaries available from ourdelta.org ● Included in '-sail' builds since 5.0.87-d10 ○ SHOW GLOBAL VARIABLES LIKE 'have_oqgraph'; Drizzle ● Basic port has been done. OQGRAPH computation engine © 2009-2011 Open Query
  • 9. Anatomy of an OQGRAPH table CREATE TABLE db.tblname ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH; Note: Mk.3 has a few additional options, discussed later. OQGRAPH computation engine © 2009-2011 Open Query
  • 10. OQGRAPH Mk.II - Inserting data ● Only insert directed edges into its memory store. ● Edge weight are optional and default to 1.0 ● Undirected edges may be represented as two directed edges, in opposite directions. INSERT INTO foo (origid,destid) VALUES (1,2), (2,3), (2,4), (4,5), (3,6), (5,6); OQGRAPH computation engine © 2009-2011 Open Query
  • 11. Selecting Edges SELECT * FROM foo; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | NULL | 1 | 2 | 1 | 0 | NULL | | NULL | 2 | 3 | 1 | 1 | NULL | | NULL | 2 | 4 | 1 | 2 | NULL | | NULL | 4 | 5 | 1 | 3 | NULL | | NULL | 3 | 6 | 1 | 4 | NULL | | NULL | 5 | 6 | 1 | 5 | NULL | +-------+--------+--------+--------+------+--------+ OQGRAPH computation engine © 2009-2011 Open Query
  • 12. Now, it's time for some magic. (shortest path calculation) ● SELECT * FROM foo WHERE latch=1 AND origid=1 AND destid=6; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | 1 | 1 | 6 | NULL | 0 | 1 | | 1 | 1 | 6 | 1 | 1 | 2 | | 1 | 1 | 6 | 1 | 2 | 3 | | 1 | 1 | 6 | 1 | 3 | 6 | +-------+--------+--------+--------+------+--------+ ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path FROM foo WHERE latch=1 AND origid=1 AND destid=6 G path: 1,2,3,6 OQGRAPH computation engine © 2009-2011 Open Query
  • 13. Other computations, ● Which paths lead to node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo WHERE latch=1 AND destid=4 G list: 1,2,4 ● Where can I get to from node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo WHERE latch=1 AND origid=4 G list: 6,5,4 OQGRAPH computation engine © 2009-2011 Open Query
  • 14. Other computations, continued. ● See docs for latch 0 and latch NULL ● latch 1 : Dijkstra's shortest path. ○ O((V + E).log V) ● latch 2 : Breadth-first search. ○ O(V+E) ● Other algorithms possible OQGRAPH computation engine © 2009-2011 Open Query
  • 15. Joins make it prettier, ● INSERT INTO people VALUES (1,’pearce’), (2,’hunnicut’), (3,’potter’), (4,’hoolihan’), (5,’winchester’), (6,’ mulcahy’); ● SELECT GROUP_CONCAT(name ORDER BY seq) path FROM foo JOIN people ON (foo.linkid = people.id) WHERE latch=1 AND origid=1 AND destid=6 G path: pearce,hunnicut,potter,mulcahy OQGRAPH computation engine © 2009-2011 Open Query
  • 16. In brief: OQGRAPH Mk. II ● Behaviour similar to MEMORY engine: ○ Table-level locking for normal tables ○ No locking for temporary tables ○ No persistence ○ No transactions ● Insert performance O(N.LOG(N)) This means... ○ It’s usable for menus & more, up to say a (few) million edges. ○ Inserts get very slow when there are a lot of edges. ○ You can use the --init-file option to copy/load on startup. OQGRAPH computation engine © 2009-2011 Open Query
  • 17. First Look: OQGRAPH Mk. III Features: ● Similar core graph implementation. ● Uses existing tables as a source for edge data. ● Does not impose any strict structure on the donor table. ● Efficient Judy sparse bitmaps for node traversal data. Notes: ● Tables are read-only and only read from the backing table. ● Table must be in same schema as the backing table. ● Current implementation is not of release quality yet. ● But it works! OQGRAPH computation engine © 2009-2011 Open Query
  • 18. Tree of Life, with Mk.III Load the tol.sql schema, Create tol_link backing store table, create table tol_link ( source int unsigned not null, target int unsigned not null, primary key (source, target), key (target) ) engine=innodb; Populate it with all the edges we need: INSERT INTO tol_link (source,target) SELECT parent,id FROM tol WHERE parent IS NOT NULL UNION ALL SELECT id,parent FROM tol WHERE parent IS NOT NULL; Query OK, 178102 rows affected (14.66 sec) Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql OQGRAPH computation engine © 2009-2011 Open Query
  • 19. Tree of Life, cont. Creating the OQGRAPH MkIII table: CREATE TABLE tol_tree ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='tol_link' origid='source' destid='target'; select count(*) from tol_treeG count(*): 178102 OQGRAPH computation engine © 2009-2011 Open Query
  • 20. Tree of Life - finding H.Sapiens SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path FROM tol_tree JOIN tol ON (linkid=id) WHERE latch=1 AND origid=1 AND destid=16421 G path: Life on Earth -> Eukaryotes -> Unikonts -> Opisthokonts -> Animals -> Bilateria -> Deuterostomia -> Chordata -> Craniata -> Vertebrata -> Gnathostomata -> Teleostomi -> Osteichthyes -> Sarcopterygii -> Terrestrial Vertebrates -> Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida -> Eupelycosauria -> Sphenacodontia -> Sphenacodontoidea -> Therapsida -> Theriodontia -> Cynodontia -> Mammalia -> Eutheria -> Primates -> Catarrhini -> Hominidae -> Homo -> Homo sapiens 1 row in set (2.13 sec) OQGRAPH computation engine © 2009-2011 Open Query
  • 21. We want your feedback!!!1one! ● Very easy to use... But do feel free to ask us for help/advice. ● OpenQuery created friendlist_graph for Drupal 6. ○ Addition to the existing friendlist module. ○ Enables easy social networking in Drupal. ○ Peter Lieverdink (@cafuego) did this in about 30 minutes ● We would like to know how you are using OQGRAPH! ○ You could be doing something really cool... OQGRAPH computation engine © 2009-2011 Open Query
  • 22. Links and support ● Binaries & Packages ○ http://mariadb.com (MariaDB 5.2 & above) < easiest to begin ○ http://ourdelta.org (MySQL 5.0) ● Source collaboration ○ http://launchpad.net/maria (in /storage/oqgraph) ○ http://launchpad.net/oqgraph ○ Development Mk3 source is currently at https://code.launchpad. net/~atcurtis/ourdelta/oqgraph-v3 ● Info, Docs, Support, Licensing, Engineering ○ http://openquery.com/graph ○ This presentation: http://goo.gl/UrybZ Thank you! Antony Curtis & Arjen Lentz graph@openquery.com OQGRAPH computation engine © 2009-2011 Open Query