SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Graph Databases and the Future of Large-Scale
          Knowledge Management



                  Marko A. Rodriguez
           T-5, Center for Nonlinear Studies
           Los Alamos National Laboratory
              http://markorodriguez.com

                     April 8, 2009
Abstract

Modern day open source and commercial graph databases can store on the
order of 1 billion relationships with some databases reaching the 10 billion
mark. These developments are making the graph database practical for
applications that require large-scale knowledge structures. Moreover, with
the Web of Data standards set forth by the Linked Data community, it is
possible to interlink graph databases across the web into a giant global
knowledge structure. This talk will discuss graph databases, their
underlying data model, their querying mechanisms, and the benefits of the
graph data structure for modeling and analysis.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Relational Database vs. the Graph Database

• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model
  is a collection interlinked tables.

• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model
  is a multi-relational graph.


                Relational Database         Graph Database
                                               d

                                        c      a
                                                      a

                                                b
                    127.0.0.1                 127.0.0.2




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Types of Graphs
• Undirected single-relational graph: homogenous set of symmetric links.

• Directed single-relational graph: homogenous set of links.

• Directed multi-relational graph: heterogenous set of links.
                       undirected single-relational graph

                          x                                     z


                       directed single-relational graph

                          x                                     z


                       directed multi-relational graph

                          x                  y                  z




                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 1

• Marko is a human and Fluffy is a dog.




                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 1

               ID      Name     Type        Legs       Fur

              0001     Marko    Human         2        false

              0002     Fluffy   Dog           4        true


            Object_Table




                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 1

               Human                                 Dog



                type                                 type


               0001                                 0002


               name                                 name
        legs           fur                 legs                fur


    2          Marko         false    4             Fluffy            true




                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 2

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 2

         ID      Name     Type    Legs     Fur              ID2            ID2

        0001     Marko    Human    2       false           0001           0002

        0002     Fluffy   Dog      4       true            0002           0001


      Object_Table                                     Friendship_Table




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 2

               Human                                          Dog



                type                                          type

                                     friend
               0001                  friend                  0002


               name                                          name
        legs           fur                          legs                fur


    2          Marko         false             4             Fluffy            true




                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 3

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 3

     ID      Name     Type    Legs   Fur       ID2          ID2           Type1       Type2

    0001     Marko    Human    2     false     0001        0002          Human       Mammal

    0002     Fluffy   Dog      4     true      0002        0001            Dog       Mammal


  Object_Table                               Friendship_Table          Subclass_Table




                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 3
                                          Mammal


                          subclassof                subclassof

                  Human                                          Dog



                   type                                          type

                                           friend
                  0001                     friend                0002


                  name                                           name
           legs             fur                          legs             fur


       2          Marko           false             4            Fluffy         true




                                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 4
• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 4

      ID      Name     Type     Legs   Fur       ID2          ID2           Type1       Type2

     0001     Marko    Human     2     false     0001        0002          Human       Mammal

     0002     Fluffy    Dog      4     true      0002        0001            Dog       Mammal

     0003    My_Rug    Carpet   N/A    N/A
                                               Friendship_Table          Subclass_Table

   Object_Table                                  ID1          ID2

                                                 0002        0003

                                               Pee_Table




                                                Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 4

                                       Mammal


                       subclassof                subclassof

               Human                                          Dog                       Carpet



                type                                          type                       type

                                        friend
               0001                     friend                0002       peedOn          0003


               name                                           name                       name
        legs             fur                         legs              fur


    2          Marko           false             4            Fluffy         true      My_Rug




                                                            Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 5

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.

• Marko and Fluffy are both mammals.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 5
        ID      Name     Type     Legs   Fur       ID2         ID2         Type1       Type2

       0001     Marko    Human     2     false     0001       0002         Human      Mammal

       0002     Fluffy    Dog      4     true      0002       0001          Dog       Mammal

       0003    My_Rug    Carpet   N/A    N/A
                                                 Friendship_Table        Subclass_Table

     Object_Table                                  ID1         ID2           ID        Type

                                                   0002       0003          0001      Human

                                                 Pee_Table                  0002        Dog


                                                                            0003      Carpet

                                                                            0001      Mammal

                                                                            0002      Mammal


                                                                         Type_Table



                                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 5

                                        Mammal


                       subclassof                 subclassof

               Human                                           Dog                       Carpet

                         type                         type

                type                                           type                       type

                                         friend
               0001                      friend                0002       peedOn          0003


               name                                            name                       name
        legs             fur                          legs              fur


    2          Marko            false             4            Fluffy         true      My_Rug




                                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural World Model

• The world is inherently (or perceived as) object-oriented.

• The world is filled with objects and relations among them.

• The multi-relational graph is a very natural representation of the world.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural Programming Model

• High-level computer languages are object-oriented.

• Nearly no impedance mismatch between the multi-relational graph and
  the programming object.

• It is easy to go from graph database to in-memory object.

          Human marko = new Human();
          marko.name = "Marko";
          marko.addFriend(fluffy);
          marko.setHasFur(false);
          marko.setLegs(2);


                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
SQL vs. SPARQL

SELECT OTY.Name FROM Object_Table AS OTX,
      Object_Table AS OTY, Friendship_Table WHERE
   OTX.Name = "Marko" AND
   Friendship_Table.ID1 = OTY.ID AND
   Friendship_Table.ID2 = OTX.ID;


SELECT ?z WHERE {
  ?x name "Marko" .
  ?y friend ?x .
  ?y name ?z }




                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Internet Address Spaces

• The Uniform Resource Identifier (URI) is the superclass of the Uniform
  Resource Locator (URL) and Uniform Resource Name (URN).




                                       Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Locator
• The set of all URLs is the address space of all resources that can be
  located and retrieved on the Web. URLs denote where a resource is.
    http://markorodriguez.com/index.html
    ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6
    ∗ http:// means GET at port 80,
    ∗ /index.html means the resource to get at that Internet location.

                                Web Server




                                  index.html




                             markorodriguez.com
                                216.251.43.6



                                               Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Name

• The set of all URNs is the address space of all resources within the urn:
  namespace.
    urn:uuid:bd93def0-8026-11dd-842be54955baa12
    urn:issn:0892-3310
    urn:doi:10.1016/j.knosys.2008.03.030

• Named resources need not be retrievable through the Web.

• URNs denote what a resource is.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Identifier
• The URI address space is an infinite space for all Internet resources.
    http://markorodriguez.com/index.html
    urn:issn:0892-3310
    ftp://markorodriguez.com/private/markos_secrets.txt
    http://www.lanl.gov#fluffy

• Imporant: URIs can denote concepts, instances, and datum.




                         lanl:fluffy        lanl:fluffy_legs




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Documents
• The World of Documents is primarily concerned with the Hyper-Text
  Transfer Protocol (HTTP) and with retrievable resources in the URL
  address space.

• These retrievable resources are files: HTML documents, images, audio,
  etc. The “web” is created when HTML documents contain URLs.
                            http://markorodriguez.com/


                                    index.html



                                        href


            Resume.html   href      Home.html          href       Research.html




                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data

• The Web of Data is primarily concerned with URIs.

• The Resource Description Framework (RDF) is the standard for
  representing the relationship between URIs and literals (e.g. float, string,
  date time, etc.).

                           subject                 predicate                object


                          lanl:marko               foaf:knows               lanl:fluffy


                           foaf:name                                       foaf:name


                "Marko A. Rodriguez"^^xsd:string                "Fluffy P. Everywhere"^^xsd:string




                                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World in RDF
                                                      lanl:Mammal



                            rdfs:subClassOf                             rdfs:subClassOf


                   lanl:Human                                                          lanl:Dog

                                   rdf:type                                 rdf:type
                        rdf:type                                                       rdf:type



                   lanl:marko                          lanl:friend                     lanl:fluffy

                                                       lanl:friend
             lanl:fur        lanl:legs                                       lanl:fur      lanl:legs
                   foaf:name                                                        foaf:name

"false"^^xsd:boolean               "2"^^xsd:integer            "true"^^xsd:boolean                  "4"^^xsd:integer


        "Marko A. Rodriguez"^^xsd:string                                "Fluffy P. Everywhere"^^xsd:string




                                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data is a Distributed Database

• The URI address space is distributed.

• URIs can denote datum.

• RDF denotes the relationships URIs.

• The Web of Data’s foundational standard is RDF.

• Therefore, the Web of Data is a distributed database.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Documents vs. the Web of Data

        Web Server                             Web Server




             HTML          href                     HTML


         127.0.0.1                              127.0.0.2




       Graph Database                        Graph Database



                        lanl:friend



         127.0.0.1                              127.0.0.2




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Current Web of Data - March 2009
                               homologenekegg                     projectgutenberg
                            symbol
                                                                                   homologenekegg
                                                                              libris                                          projectgutenberg
                                                   cas                          symbol
                                                                                     bbcjohnpeel
                                                                                                                                          libris
                  unists                    diseasome dailymed                 w3cwordnet
                                   chebi
                                        hgnc     pubchem           eurostat
                          mgi
                                   geneid
                                            omim                      wikicompany         geospecies
                                                                                                                   cas                               bbcjohnpeel
                                                                                                       diseasome dailymed
                                               drugbank                        worldfactbook
                       reactome
                                   pubmed                    unists
                                                                 magnatune
                                                                              opencyc                                                          w3cwordnet
              uniparc                                     linkedct                       chebi
                                                                                            freebase

taxonomy
         uniref
                             uniprot
                        geneontology
                                       interpro                                                    hgnc       pubchem              eurostat
                                   pdb                                   yago      umbel
                                             pfam                      mgi
                                                                   dbpedia                             omim
                                                                                              bbclatertotpgovtrack                    wikicompany         geospecies
                                         prosite
                               prodom                                     flickrwrappr
                                                                                         geneid
                                                                                      opencalais

                                                                reactome
                                                                                                uscensusdata
                                                                                                          drugbank                             worldfactbook
                                                                      lingvoj linkedmdb
                                                                                           surgeradio
                                                                                                                                 magnatune
                                                                                          pubmed
                                                                                  virtuososponger                                             opencyc
                                                  rdfbookmashup
                                             uniparc                                                                                                        freebase
                                                   swconferencecorpus         geonames musicbrainz      myspacewrapper    linkedct
                                         dblpberlin                         uniprot pubguide
                      taxonomy                         revyu                                      interpro
                                    uniref                       geneontologyjamendo bbcplaycountdata
                                                                   rdfohloh
                                                                                         pdb                                                       umbel
                                                                                                                                         yago
                                                    semanticweborg          siocsites        riese
                                                                                                        pfam                       dbpedia                    bbclatertotp            govtrack
                                                                foafprofiles
                            dblphannover    openguides                         audioscrobbler       prosite bbcprogrammes
                                                                                prodom
                                                                                     crunchbase                                           flickrwrappropencalais
                                                                    doapspace                                                                                   uscensusdata
                                                            flickrexporter
                                                                                                                                                           surgeradio
              budapestbme                                               qdos
                                                                                                                                      lingvoj linkedmdb
                                                                          semwebcentral                                                           virtuososponger
            eurecom                    ecssouthampton

                   pisa
                          dblprkbexplorer
                                  newcastle                                                                            rdfbookmashup
                                                                                                                                                   geonames musicbrainz
                                      rae2001
                                  eprints
                                       irittoulouse
                    laascnrs acm citeseer
                                                                                                                         swconferencecorpus                                        myspacewrapper
                           ieee                                                                                dblpberlin                                            pubguide
                resex
                                ibm

                                                                                                                            revyu                               jamendo
                                                                                                                                       rdfohloh
                                                                                                                                                                               bbcplaycountdata
                                                                                                                        semanticweborg        siocsites        riese
                                                                                                                                  foafprofiles
                                                                                                                  openguides                     audioscrobbler            bbcprogrammes
                                                                                               dblphannover
                                                                                                                                                       crunchbase
                                                                                                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
                                                                                                                                       doapspace


                                                                                                                                 flickrexporter
                                                                                                                                            qdos
The Current Web of Data - March 2009
data set           domain       data set           domain        data set              domain
audioscrobbler     music        govtrack           government    pubguide              books
bbclatertotp       music        homologene         biology       qdos                  social
bbcplaycountdata   music        ibm                computer      rae2001               computer
bbcprogrammes      media        ieee               computer      rdfbookmashup         books
budapestbme        computer     interpro           biology       rdfohloh              social
chebi              biology      jamendo            music         resex                 computer
crunchbase         business     laascnrs           computer      riese                 government
dailymed           medical      libris             books         semanticweborg        computer
dblpberlin         computer     lingvoj            reference     semwebcentral         social
dblphannover       computer     linkedct           medical       siocsites             social
dblprkbexplorer    computer     linkedmdb          movie         surgeradio            music
dbpedia            general      magnatune          music         swconferencecorpus    computer
doapspace          social       musicbrainz        music         taxonomy              reference
drugbank           medical      myspacewrapper     social        umbel                 general
eurecom            computer     opencalais         reference     uniref                biology
eurostat           government   opencyc            general       unists                biology
flickrexporter      images       openguides         reference     uscensusdata          government
flickrwrappr        images       pdb                biology       virtuososponger       reference
foafprofiles        social       pfam               biology       w3cwordnet            reference
freebase           general      pisa               computer      wikicompany           business
geneid             biology      prodom             biology       worldfactbook         government
geneontology       biology      projectgutenberg   books         yago                  general
geonames           geographic   prosite            biology       ...



                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Cultural Differences that are Leading to a New World of
          Large-Scale Knowledge Management

• Relational databases tend to not maintain public access points.

• Relational database users tend to not publish their schemas.

• Web of Data graph databases maintain public access points called
  SPARQL end-points.

• Web of Data graph databases tend to reuse and extend public schemas
  called ontologies.




                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Conclusion

• Thank you for your time...
    My homepage: http://markorodriguez.com
    Neno/Fhat: http://neno.lanl.gov
    Collective Decision Making Systems: http://cdms.lanl.gov
    Faith in the Algorithm: http://faithinthealgorithm.net
    MESUR: http://www.mesur.org




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Mais conteúdo relacionado

Mais procurados

Chord presentation
Chord presentationChord presentation
Chord presentationGertThijs
 
Internet control message protocol
Internet control message protocolInternet control message protocol
Internet control message protocolasimnawaz54
 
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYAPYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYAMaulik Borsaniya
 
01.number systems
01.number systems01.number systems
01.number systemsrasha3
 
Unit 5 internal sorting & files
Unit 5  internal sorting & filesUnit 5  internal sorting & files
Unit 5 internal sorting & filesDrkhanchanaR
 
Kerala university m.sc. computer science syllabus
Kerala university m.sc. computer science syllabusKerala university m.sc. computer science syllabus
Kerala university m.sc. computer science syllabusChakravarthy Chakra
 
Sparse matrix and its representation data structure
Sparse matrix and its representation data structureSparse matrix and its representation data structure
Sparse matrix and its representation data structureVardhil Patel
 
Presentation Routing algorithm
Presentation Routing algorithmPresentation Routing algorithm
Presentation Routing algorithmBasit Hussain
 
Intermediate code generator
Intermediate code generatorIntermediate code generator
Intermediate code generatorsanchi29
 
Data Structures with C Linked List
Data Structures with C Linked ListData Structures with C Linked List
Data Structures with C Linked ListReazul Islam
 
CS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IVCS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IVpkaviya
 

Mais procurados (20)

Chord presentation
Chord presentationChord presentation
Chord presentation
 
Client server model
Client server modelClient server model
Client server model
 
Internet control message protocol
Internet control message protocolInternet control message protocol
Internet control message protocol
 
Ch6
Ch6Ch6
Ch6
 
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYAPYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
 
01.number systems
01.number systems01.number systems
01.number systems
 
Unit 5 internal sorting & files
Unit 5  internal sorting & filesUnit 5  internal sorting & files
Unit 5 internal sorting & files
 
Counting sort
Counting sortCounting sort
Counting sort
 
Heap sort
Heap sort Heap sort
Heap sort
 
Kerala university m.sc. computer science syllabus
Kerala university m.sc. computer science syllabusKerala university m.sc. computer science syllabus
Kerala university m.sc. computer science syllabus
 
Sparse matrix and its representation data structure
Sparse matrix and its representation data structureSparse matrix and its representation data structure
Sparse matrix and its representation data structure
 
Presentation Routing algorithm
Presentation Routing algorithmPresentation Routing algorithm
Presentation Routing algorithm
 
Intermediate code generator
Intermediate code generatorIntermediate code generator
Intermediate code generator
 
Data Structures with C Linked List
Data Structures with C Linked ListData Structures with C Linked List
Data Structures with C Linked List
 
Link state routing protocol
Link state routing protocolLink state routing protocol
Link state routing protocol
 
Lecture1 introduction compilers
Lecture1 introduction compilersLecture1 introduction compilers
Lecture1 introduction compilers
 
2 phase locking protocol DBMS
2 phase locking protocol DBMS2 phase locking protocol DBMS
2 phase locking protocol DBMS
 
CS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IVCS8591 Computer Networks - Unit IV
CS8591 Computer Networks - Unit IV
 
Introduction to numpy
Introduction to numpyIntroduction to numpy
Introduction to numpy
 
Compiler Design Unit 4
Compiler Design Unit 4Compiler Design Unit 4
Compiler Design Unit 4
 

Mais de Marko Rodriguez

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic MachineMarko Rodriguez
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data TypeMarko Rodriguez
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryMarko Rodriguez
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialMarko Rodriguez
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 

Mais de Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Graph Databases and the Future of Large-Scale Knowledge Management

  • 1. Graph Databases and the Future of Large-Scale Knowledge Management Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com April 8, 2009
  • 2. Abstract Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 3. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 4. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 5. The Relational Database vs. the Graph Database • A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model is a collection interlinked tables. • A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model is a multi-relational graph. Relational Database Graph Database d c a a b 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 6. Types of Graphs • Undirected single-relational graph: homogenous set of symmetric links. • Directed single-relational graph: homogenous set of links. • Directed multi-relational graph: heterogenous set of links. undirected single-relational graph x z directed single-relational graph x z directed multi-relational graph x y z Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 7. Our Make Believe World - Phase 1 • Marko is a human and Fluffy is a dog. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 8. Our World Modeled in a Relational Database - Phase 1 ID Name Type Legs Fur 0001 Marko Human 2 false 0002 Fluffy Dog 4 true Object_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 9. Our World Modeled in a Graph Database - Phase 1 Human Dog type type 0001 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 10. Our Make Believe World - Phase 2 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 11. Our World Modeled in a Relational Database - Phase 2 ID Name Type Legs Fur ID2 ID2 0001 Marko Human 2 false 0001 0002 0002 Fluffy Dog 4 true 0002 0001 Object_Table Friendship_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 12. Our World Modeled in a Graph Database - Phase 2 Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 13. Our Make Believe World - Phase 3 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 14. Our World Modeled in a Relational Database - Phase 3 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal Object_Table Friendship_Table Subclass_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 15. Our World Modeled in a Graph Database - Phase 3 Mammal subclassof subclassof Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 16. Our Make Believe World - Phase 4 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 17. Our World Modeled in a Relational Database - Phase 4 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 0002 0003 Pee_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 18. Our World Modeled in a Graph Database - Phase 4 Mammal subclassof subclassof Human Dog Carpet type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 19. Our Make Believe World - Phase 5 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. • Marko and Fluffy are both mammals. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 20. Our World Modeled in a Relational Database - Phase 5 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 ID Type 0002 0003 0001 Human Pee_Table 0002 Dog 0003 Carpet 0001 Mammal 0002 Mammal Type_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 21. Our World Modeled in a Graph Database - Phase 5 Mammal subclassof subclassof Human Dog Carpet type type type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 22. The Graph as the Natural World Model • The world is inherently (or perceived as) object-oriented. • The world is filled with objects and relations among them. • The multi-relational graph is a very natural representation of the world. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 23. The Graph as the Natural Programming Model • High-level computer languages are object-oriented. • Nearly no impedance mismatch between the multi-relational graph and the programming object. • It is easy to go from graph database to in-memory object. Human marko = new Human(); marko.name = "Marko"; marko.addFriend(fluffy); marko.setHasFur(false); marko.setLegs(2); Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 24. SQL vs. SPARQL SELECT OTY.Name FROM Object_Table AS OTX, Object_Table AS OTY, Friendship_Table WHERE OTX.Name = "Marko" AND Friendship_Table.ID1 = OTY.ID AND Friendship_Table.ID2 = OTX.ID; SELECT ?z WHERE { ?x name "Marko" . ?y friend ?x . ?y name ?z } Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 25. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 26. Internet Address Spaces • The Uniform Resource Identifier (URI) is the superclass of the Uniform Resource Locator (URL) and Uniform Resource Name (URN). Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 27. The Uniform Resource Locator • The set of all URLs is the address space of all resources that can be located and retrieved on the Web. URLs denote where a resource is. http://markorodriguez.com/index.html ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6 ∗ http:// means GET at port 80, ∗ /index.html means the resource to get at that Internet location. Web Server index.html markorodriguez.com 216.251.43.6 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 28. The Uniform Resource Name • The set of all URNs is the address space of all resources within the urn: namespace. urn:uuid:bd93def0-8026-11dd-842be54955baa12 urn:issn:0892-3310 urn:doi:10.1016/j.knosys.2008.03.030 • Named resources need not be retrievable through the Web. • URNs denote what a resource is. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 29. The Uniform Resource Identifier • The URI address space is an infinite space for all Internet resources. http://markorodriguez.com/index.html urn:issn:0892-3310 ftp://markorodriguez.com/private/markos_secrets.txt http://www.lanl.gov#fluffy • Imporant: URIs can denote concepts, instances, and datum. lanl:fluffy lanl:fluffy_legs Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 30. The Web of Documents • The World of Documents is primarily concerned with the Hyper-Text Transfer Protocol (HTTP) and with retrievable resources in the URL address space. • These retrievable resources are files: HTML documents, images, audio, etc. The “web” is created when HTML documents contain URLs. http://markorodriguez.com/ index.html href Resume.html href Home.html href Research.html Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 31. The Web of Data • The Web of Data is primarily concerned with URIs. • The Resource Description Framework (RDF) is the standard for representing the relationship between URIs and literals (e.g. float, string, date time, etc.). subject predicate object lanl:marko foaf:knows lanl:fluffy foaf:name foaf:name "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 32. Our Make Believe World in RDF lanl:Mammal rdfs:subClassOf rdfs:subClassOf lanl:Human lanl:Dog rdf:type rdf:type rdf:type rdf:type lanl:marko lanl:friend lanl:fluffy lanl:friend lanl:fur lanl:legs lanl:fur lanl:legs foaf:name foaf:name "false"^^xsd:boolean "2"^^xsd:integer "true"^^xsd:boolean "4"^^xsd:integer "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 33. The Web of Data is a Distributed Database • The URI address space is distributed. • URIs can denote datum. • RDF denotes the relationships URIs. • The Web of Data’s foundational standard is RDF. • Therefore, the Web of Data is a distributed database. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 34. The Web of Documents vs. the Web of Data Web Server Web Server HTML href HTML 127.0.0.1 127.0.0.2 Graph Database Graph Database lanl:friend 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 35. The Current Web of Data - March 2009 homologenekegg projectgutenberg symbol homologenekegg libris projectgutenberg cas symbol bbcjohnpeel libris unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi geneid omim wikicompany geospecies cas bbcjohnpeel diseasome dailymed drugbank worldfactbook reactome pubmed unists magnatune opencyc w3cwordnet uniparc linkedct chebi freebase taxonomy uniref uniprot geneontology interpro hgnc pubchem eurostat pdb yago umbel pfam mgi dbpedia omim bbclatertotpgovtrack wikicompany geospecies prosite prodom flickrwrappr geneid opencalais reactome uscensusdata drugbank worldfactbook lingvoj linkedmdb surgeradio magnatune pubmed virtuososponger opencyc rdfbookmashup uniparc freebase swconferencecorpus geonames musicbrainz myspacewrapper linkedct dblpberlin uniprot pubguide taxonomy revyu interpro uniref geneontologyjamendo bbcplaycountdata rdfohloh pdb umbel yago semanticweborg siocsites riese pfam dbpedia bbclatertotp govtrack foafprofiles dblphannover openguides audioscrobbler prosite bbcprogrammes prodom crunchbase flickrwrappropencalais doapspace uscensusdata flickrexporter surgeradio budapestbme qdos lingvoj linkedmdb semwebcentral virtuososponger eurecom ecssouthampton pisa dblprkbexplorer newcastle rdfbookmashup geonames musicbrainz rae2001 eprints irittoulouse laascnrs acm citeseer swconferencecorpus myspacewrapper ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata semanticweborg siocsites riese foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase Risk Symposium – Santa Fe, New Mexico – April 8, 2009 doapspace flickrexporter qdos
  • 36. The Current Web of Data - March 2009 data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ... Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 37. Cultural Differences that are Leading to a New World of Large-Scale Knowledge Management • Relational databases tend to not maintain public access points. • Relational database users tend to not publish their schemas. • Web of Data graph databases maintain public access points called SPARQL end-points. • Web of Data graph databases tend to reuse and extend public schemas called ontologies. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 38. Conclusion • Thank you for your time... My homepage: http://markorodriguez.com Neno/Fhat: http://neno.lanl.gov Collective Decision Making Systems: http://cdms.lanl.gov Faith in the Algorithm: http://faithinthealgorithm.net MESUR: http://www.mesur.org Risk Symposium – Santa Fe, New Mexico – April 8, 2009