SlideShare uma empresa Scribd logo
1 de 60
Baixar para ler offline
Thursday, 21 June 12
Using MongoDB as a high
               performance graph
               database


                       MongoDB UK, 20th June 2012
                       Chris Clarke
                       CTO, Talis Education Limited




Thursday, 21 June 12


Who is talis?

Using mongo about 8 months (since 2.0)
5 months in production
What this talk not about




Thursday, 21 June 12

A blueprint for what you should do
A pitch to encourage you to take our approach
Providing or proving performance benchmarks
Evangelism for the semantic web or linked data
Encouraging you to contribute/download/use an open source
project
Optimised for your use case

Although we can talk to you about any of the above (see me
after)
So, what is this talk about?




Thursday, 21 June 12

Our journey of using MongoDB as a high performance graph
database
Specifically the software wrapper we implemented on top of
Mongo to give us a leg up in terms of scalability and performance
To give you some ideas for how to work with graph data models
if you’d like to use document databases
GRAPHS 101




Thursday, 21 June 12

Apologies
Nodes and edges
or
Resources and properties
Really easy to represents facts
John knows Jane


              John          knows                Jane




Thursday, 21 June 12

Ball and stick diagrams
This is an undirected graph. It implies that John knows Jane and
Jane knows John. The property has no directional significance.
John knows Jane
                       Jane knows John


              John          knows                Jane




Thursday, 21 June 12

This is an undirected graph. It implies that John knows Jane and
Jane knows John. The property has no directional significance.
John knows Jane
                          Jane ? John


              John          knows                 Jane




Thursday, 21 June 12

This is a directed graph. The relationship is one way. To add Jane
knows John we need a second property.

We will only use directed graphs from herein as they are more
specific
John knows Jane
                       Jane knows John
                          knows
               John                      Jane
                            knows




Thursday, 21 June 12
Triples + RDF 101




Thursday, 21 June 12
Subject     Property   Object

                       John   knows       Jane




Thursday, 21 June 12

This is a triple

Property = predicate
Subject      Property     Object

                       John   knows           Jane

                       Jane   knows          John




Thursday, 21 June 12

This is a second triple
The same resource can be a subject or an object
Subject                   Property                       Object
      http://example.com/John   http://xmlns.com/foaf/0.1/knows http://example.com/Jane




Thursday, 21 June 12

RDF
Resources and properties as URIs
URIs can be dereferenced
Can share common property descriptions (RDF Schemas)
Here using FOAF - billions if not trillions of triples defined using
FOAF
Subject                     Property                            Object
      http://example.com/John               foaf:knows                 http://example.com/Jane

      http://example.com/John                foaf:name                         “John”




                           PREFIX foaf: <http://xmlns.com/foaf/0.1/>



Thursday, 21 June 12

Namespaces for readability

In RDF subjects are always uris
But objects can be literals i.e. plain text
Many RDF/graph databases allow you to further type literals as
dates, numbers, etc.
Subject                        Property                              Object
      http://example.com/John                   rdf:type                          foaf:Person
      http://example.com/John                  foaf:name                            “John”
      http://example.com/John                  foaf:knows                   http://example.com/Jane
      http://example.com/Jane                   rdf:type                          foaf:Person
      http://example.com/Jane                  foaf:name                            “Jane”
      http://example.com/Jane                  foaf:knows                   http://example.com/John




                       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                                PREFIX foaf: <http://xmlns.com/foaf/0.1/>



Thursday, 21 June 12

Here we type John and Jane as foaf:Person using rdf:type

Note both John and Jane appear as subjects and resources

This RDF graph represents six facts
foaf:Person


                         rdf:type                  rdf:type
                                    foaf:knows

           example:John                              example:Jane

                                      foaf:knows




                “John”                                  “Jane”


Thursday, 21 June 12

Here it is in ball and stick
FFS! I can do that in two
                     minutes in BSON




Thursday, 21 June 12
> db.people.find()
                       {
                          _id: ObjectID(‘123’),
                          name: ‘John’
                          knows: [ObjectID(‘456’)]
                       },
                       {
                          _id: ObjectID(‘456’),
                          name: ‘Jane’
                          knows: [ObjectID(‘123’)]
                       }




Thursday, 21 June 12

Yes, you can!
Data only makes sense inside your db though
http://sheikspear.blogspot.co.uk/2011/07/simples.html


Thursday, 21 June 12

Talk over, right?
We can all go home
Some useful stuff, using RDF




Thursday, 21 June 12

Lets look at some reasons why we think RDF is good
attribution


Thursday, 21 June 12

This is the linked open data cloud

Linked data is a way RDF published on the open web

Search linked data TED to hear why Tim Burness Lee cares about
this

Each blob on this diagram represents an open, interlinked
dataset. The lines between them represent the interlinking
between data sets

Billions of public “facts” and growing exponentially from sites
such as BBC, governments, Last.fm, Wikipedia
Merging data from different
               sources is really easy




Thursday, 21 June 12

Because the format is subject, predicate, object the shape of RDF
is always the same.
Because schemas are public and widely shared the same
properties are used all over the place.
Really easy to use this data in your own app and remix
Dataset A          Dataset B

               example:John    example:John


                   rdf:type     foaf:name



                                  “John”
                foaf:Person




Thursday, 21 June 12
Dataset A+B

                                         example:John


                              rdf:type             foaf:name



                                                               “John”
                foaf:Person




Thursday, 21 June 12

Really easy to merge graphs
“Designed in” to the data format
Lots of existing tooling to do this
RDF query language:
                            SPARQL




Thursday, 21 June 12
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      SELECT ?name ?email
      WHERE {
        ?person a foaf:Person.
        ?person foaf:name ?name.
        ?person foaf:mbox ?email.
      }
      ORDER BY ?name
      LIMIT 50




Thursday, 21 June 12

SPARQL is mega flexible. Lots of functions for grouping, walking
graphs, pattern matching, inference, UNIONS, Geo extensions
etc. etc. - all that shit.
Most if not all of those datasets will have a SPARQL endpoint you
can query
SELECT      Tabular
                       DESCRIBE    Graph
                       ASK         Boolean
                       CONSTRUCT   Graph




Thursday, 21 June 12

4 main query types
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      SELECT ?name ?email
      WHERE {
                FFS! That looks like SQL!
        ?person a foaf:Person.
        ?person foaf:name ?name.
        ?person foaf:mbox ?email.
      }
      ORDER BY ?name
      LIMIT 50




Thursday, 21 June 12

Yes it does. The WHERE clause is basically doing a shit load of
joins. I’ll come back to that.
Application DB                               Triple store +
       (SQL or other)                                  SPARQL

                        Offline conversion process




Thursday, 21 June 12

Most datasets on the LOD diagram don’t exist natively as Linked
data and RDF. They are post-produced.
Data not held natively - so conversion script - needs to be
maintained and updated every time app schema changes
Data not up to date (1 hour, 1 day, 1 month behind?)
Our innovation:
                       Native Linked Data
                          Applications




Thursday, 21 June 12

We started working on these applications back in 2008

They are natively linked data so solve the conversion+currency
issue

There is no other “format” or schema the data is stored in, it’s
native RDF

When you have no schema, and you can integrate data from
elsewhere on the web, it’s addictive
Our problem:
             FFS! For applications, we
            need humongous scale and
                   performance



Thursday, 21 June 12

Those applications becoming rather popular with our users...

sub 50ms query time

Modern web apps need speed and data scale

Out-grown triple store and SPARQL

SPARQL is very flexible and expressive. It’s also expensive
SPARQL is great for data sets where the questions you can ask are
limitless, but our applications need a data layer where speed is
measured in single digit ms.

Complex caching (w/Memcache) to achieve performance and
scalability
90:10 read:write
Tripod




Thursday, 21 June 12

It’s a pod for our triples
A triple store designed for applications and scalability
Based on Mongo
Functional requirements:
         • Order magnitude increase in perf/scale
         • Graph-orientated interface

         Non-functional requirements:
         • Strong community


Thursday, 21 June 12

Existing code very graph orientated
Core data format
                  Tripod API
         Dealing with complex queries
                 TripodTables
               Free text search



Thursday, 21 June 12

Walk through Tripod looking at 5 areas
{
         ‘http://example.com/John’ : {
             ‘http://purl.org/dc/elements/1.1/name’ : [
                 {
                    value: ‘John’,
                    type: ‘literal’
                 }
             ],
             ‘http://purl.org/dc/elements/1.1/knows’ : [
                 {
                    value: ‘http://example.com/Jane’,
                    type: ‘uri’
                 }
             ]
         },
         ‘http://example.com/Jane’ : {
             ‘http://purl.org/dc/elements/1.1/name’ : [
                 {
                    value: ‘Jane’,
                    type: ‘literal’
                 }
             ],
             ‘http://purl.org/dc/elements/1.1/knows’ : [
                 {
                    value: ‘http://example.com/John’,
                    type: ‘uri’
                 },
                 {
                    value: ‘http://example.com/James’,
                    type: ‘uri’
                 }
             ]
         }
    }




Thursday, 21 June 12

RDF/JSON - a serialisation of RDF in JSON

Neither disk space efficient or readable

full-formed properties not compatible with Mongo (dot notation)

Even single values inside an array (problems for compound
indexing)
> db.CBD_people.find()
      {
         _id: ‘http://example.com/John’,
         ‘foaf:name’: {l: ‘John’},
         ‘foaf:knows’: {u: ‘http://example.com/Jane’}
      },
      {
         _id: ‘http://example.com/Jane’,
         ‘foaf:name’: {l: ‘Jane’},
         ‘foaf:knows’: [
           {u:‘http://example.com/John’},
           {u:‘http://example.com/James’}
         ]
      }




Thursday, 21 June 12

Same semantics

2 documents here

Concise bound descriptions - all data known about a subject,
one relationship deep

One document per subject per collection, keyed (and thus
enforced) by Subject URI

Property names are namespaced

CBD collections are deemed as read/write in Tripod
class MongoGraph extends SimpleGraph {

               function add_tripod_array($tarray)
               function to_tripod_array($docId)

     }




Thursday, 21 June 12

All of our app already uses SimpleGraph from a library called
Moriarty (Google Code)

Simple extension which can ingest/output the data format on
prev slide
Core data format
                  Tripod API
         Dealing with complex queries
                 TripodTables
               Free text search



Thursday, 21 June 12

Walk through Tripod looking at 5 areas
interface ITripod
   {
       public function   select($query,$fields,$sortBy=null,$limit=null);
       public function   describeResource($resource);
       public function   describeResources(Array $resources);
       public function   saveChanges($oldGraph, $newGraph);
       public function   search($query);
   }




Thursday, 21 June 12

Almost the same as our existing data access API onto generic
triple store

All of these methods return graphs, all are mega-simple queries
on the CBD collections

None of these methods support joins (WHERE clause in SPARQL)
public function describeResource($resource)
    {
       $query = array(“_id”=>$resource);
       $bson = $this->getCollection()->findOne($query);
       $graph = new MongoGraph();
       $graph->add_tripod_data($bson);
       return $graph;
    }




Thursday, 21 June 12

These methods mega simple to implement as they translate to
really simple Mongo Queries on the CBD collections returning
single objects
interface ITripod
   {
       public function   select($query,$fields,$sortBy=null,$limit=null);
       public function   describeResource($resource);
       public function   describeResources(Array $resources);
       public function   saveChanges($oldGraph, $newGraph);
       public function   search($query);

         public function getViewForResource($resource,$viewType);
         public function getViewForResources(Array $resources,$viewType);
         public function getViews(Array $filter,$viewType);

   }




Thursday, 21 June 12

Some extra methods to deal with complex queries involving joins
Core data format
                  Tripod API
         Dealing with complex queries
                 TripodTables
               Free text search



Thursday, 21 June 12

2 things we realised when looking at our applications
DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?
    authorList ?author ?usedBy ?creator ?libraryNote ?publisher
    WHERE
    {
       OPTIONAL
       {
            <http://example.com/foo> resource:contains ?sectionOrItem .
            OPTIONAL
           {
               ?sectionOrItem resource:resource ?resource .
                OPTIONAL { ?resource dcterms:isPartOf ?document . }
                OPTIONAL
                {
                   ?resource bibo:authorList ?authorList .
                     OPTIONAL { ?authorList ?p ?author . }
                }
                OPTIONAL { ?resource dcterms:publisher ?publisher . }
            }
            OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem }
       } .
       OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } .
       OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }
    }




Thursday, 21 June 12

Typical SPARQL query in our app

9 “joins” in this query
DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?
    authorList ?author ?usedBy ?creator ?libraryNote ?publisher
    WHERE
    {
       OPTIONAL
       {
            <http://example.com/foo> resource:contains ?sectionOrItem .
            OPTIONAL
           {
               ?sectionOrItem resource:resource ?resource .
                OPTIONAL { ?resource dcterms:isPartOf ?document . }
                OPTIONAL
                {
                   ?resource bibo:authorList ?authorList .
                     OPTIONAL { ?authorList ?p ?author . }
                }
                OPTIONAL { ?resource dcterms:publisher ?publisher . }
            }
            OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem }
       } .
       OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } .
       OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }
    }




Thursday, 21 June 12

Only thing that changes at run time in this query is this URI

Flexibility of SPARQL great for developer but terrible here for
system performance

Query engine needs to join 9 times! Flexibility costs us every
time we run this query!

This is why we hid it behind a cache
join
    count
    follow sequences (n times)
    join across databases

    All the above with a condition

    include certain properties
    include all properties


Thursday, 21 June 12

2nd thing

We only make use of minimal SPARQL

And some of these aren’t even well supported in SPARQL
(sequences + join across databases)
Materialised views, generated
           infrequently, read often




Thursday, 21 June 12

Remember 90:10 read:update

View specifications based on a subset of SPARQL

Views are for DESCRIBE like queries where all the data is brought
back in one hit (not tabular data)
{
        _id: "v_resource_brief",
        from: "CBD_harvest",
        type: "http://talisaspire.com/schema#Resource",
        include: ["rdf:type", "dct:subject", "dct:isVersionOf",
    "searchterms:usedAt", "dc:identifier"],
        joins: {
            "acorn:preferredMetadata": [],
            "acorn:listReferences": {
                 include: ["acorn:list"]
            },
            "acorn:bookmarkReferences": {
                 include: ["acorn:bookmark"]
            },
            "dcterms:isPartOf": [],
            "acorn:partReferences": {
                 include: ["dct:hasPart"],
                 joins: {
                     "dct:hasPart": {
                          joins: {
                              "acorn:preferredMetadata": []
                          }
                     }
                 }
            }
        }
    }




Thursday, 21 June 12

A view specification - itself a document that can be stored in
Mongo

8 keywords:

type from include joins
ttl followSequence maxJoins counts
Generated by incremental
                   MapReduce when:
                   1) Data is changed
                      2) TTL expires



Thursday, 21 June 12

Tripod can take these specifications and manage views in a
special collection within the DB.

They expire and are regenerated automatically (and
incrementally)

Incremental map reduce inside the DB

Fast, interleaves with reads
> db.views.findOne()
   {
       "_id" : {
           "rdf:resource" : "http://talisaspire.com/examples/1",
           "type" : "v_resource_full"
       },
       "value" : {
           "graphs" : [
               {
                   "_id" : "http://talisaspire.com/examples/1",
                   "rdf:type" : {
                        "type" : "uri",
                        "value" : "http://talisaspire.com/schema#Resource"
                   }
               }
           ],
           "impactIndex" : [
               "rdf:resource" : "http://talisaspire.com/examples/1"
           ]
       }
   }




Thursday, 21 June 12

This is what a view looks like

ID is a composite key of the view type and root resource
Graphs is a collection of CBDs

MongoGraph we displayed earlier can take this and represent it
as a unified graph to the application

Impact index - A watch list of resources. When resources are
saved the impact index is queried to find views that need
invalidating

TTL is an alternative. If in viewspec timestamp is stored in view to
determine when it can be invalidated
1                            2

                                  3


                                  4

                  attribution


Thursday, 21 June 12

Match views to data update rate
Core data format
                  Tripod API
         Dealing with complex queries
                 TripodTables
               Free text search



Thursday, 21 June 12

Tripod Tables are for larger datasets which cannot be brought
back in one hit

They can be paged or have individual columns indexed for fast
sort capability
SELECT ?listName ?listUri!
     WHERE
     {
     !        ?resource bibo:isbn10 "$isbn"
     !        UNION
     !        {
     !        !        ?resource bibo:isbn10 "$isbnLowerCase" .
     !        }
     !        ?item resource:resource ?resource .
     !        UNION
     !        {
     !        !        ?resourcePartOf bibo:isbn10 "$isbn" .
     !        !        UNION
     !        !        {
     !        !       !         ?resourcePartOf bibo:isbn10 "$isbnLowerCase" .
     !        !        }
     !        !        ?resourcePartOf dct:hasPart ?resource .
     !        !        ?item resource:resource ?resource .
         }
         ?listUri resource:contains ?item .
         ?listUri sioc:name ?listName .
         ?listUri rdf:type resource:List
     }
     LIMIT 10
     OFFSET 40




Thursday, 21 June 12

This is a select query that brings back a two col document

OFFSET

LIMIT
<?xml version="1.0"?>
     <sparql xmlns="http://www.w3.org/2005/sparql-results#">
     !    <head>
     !    !    <variable name="label"/>
     !    !    <variable name="type"/>
     !    </head>
     !    <results>
     !    !    <result>
     !    !    !     <binding name="label">
     !    !    !     !    <literal>Tropical grassland</literal>
     !    !    !     </binding>
     !    !    !     <binding name="type">
     !    !    !     !    <uri>http://purl.org/ontology/wo/TerrestrialHabitat</uri>
     !    !    !     </binding>
     !    !    </result>
     !    !    <result>
     !    !    !     <binding name="label">
     !    !    !     !    <literal>Grassy field</literal>
     !    !    !     </binding>
     !    !    !     <binding name="type">
     !    !    !     !    <uri>http://purl.org/ontology/wo/TerrestrialHabitat</uri>
     !    !    !     </binding>
     !    !    </result>
     !    </results>
     </sparql>




Thursday, 21 June 12

SPARQL SELECT results - tabular format - here in XML
> db.t_resource.findOne()
  {
     "_id" : "http://talisaspire.com/resources/3SplCtWGPqEyXcDiyhHQpA-2",
     "value" : {
         "type" : [
            "http://purl.org/ontology/bibo/Book",
            "http://talisaspire.com/schema#Resource"
         ],
         "isbn" : "9780393929690",
         "isbn13" : [
            "9780393929691",
            "9780393929691-2",
  !          "9780393929691-3"
         ],
         "impactIndex" : [
            "http://talisaspire.com/works/4d101f63c10a6",
         ]
     }
  }




Thursday, 21 June 12

This time our map reduce doesn’t create one doc as with
materialised views

We get one doc per row
Core data format
                  Tripod API
         Dealing with complex queries
                 TripodTables
               Free text search



Thursday, 21 June 12

Our triple store included free text search

We wanted to stream updates into Elastic Search or A N Other
search solution

When documents saved, same specification language used to
build Search Document Format docs and submit them to an
endpoint

We like ElasticSearch but you could use Amazon CloudSearch
Limitations




Thursday, 21 June 12

Map Reduce as a non-blocking db.eval() and also to work around
sync PHP programming model

PHP only for now - our web apps were PHP

To get a SPARQL endpoint we are exporting data out to Fueski -
solved the mapping not the currency (for SPARQL)
Future




Thursday, 21 June 12

Node JS port
Use as a server not a library
Eliminate dependancy on map reduce
Specification version control
Tap into op log for stream approach into Fuseki and other
locations
Named graph support
Further optimisation of data model
Maybe open source
That’s it




Thursday, 21 June 12
Questions?


   Find us on:
   Web: talisaspire.com
   Twitter: @talisaspire
   YouTube: youtube.com/user/TalisAspire
   Facebook: facebook.com/talisaspire
   Support: support.talisaspire.com

Thursday, 21 June 12
Find us on:
   Web: talisaspire.com
   Twitter: @talisaspire
   YouTube: youtube.com/user/TalisAspire
   Facebook: facebook.com/talisaspire
   Support: support.talisaspire.com

Thursday, 21 June 12

Mais conteúdo relacionado

Mais procurados

オープンデータをLOD化するデータソン in 高槻
オープンデータをLOD化するデータソン in 高槻オープンデータをLOD化するデータソン in 高槻
オープンデータをLOD化するデータソン in 高槻Kouji Kozaki
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudRichard Cyganiak
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresMarkus Lanthaler
 
JSON: The Basics
JSON: The BasicsJSON: The Basics
JSON: The BasicsJeff Fox
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataGregg Kellogg
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 
LODを使ってみよう!
LODを使ってみよう!LODを使ってみよう!
LODを使ってみよう!uedayou
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
 
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Little Big Data #1. 바닥부터 시작하는 데이터 인프라Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Little Big Data #1. 바닥부터 시작하는 데이터 인프라Seongyun Byeon
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseGuido Schmutz
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug ClassJava Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug ClassCODE WHITE GmbH
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...joaomatosf_
 
ES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptWojciech Dzikowski
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

Mais procurados (20)

オープンデータをLOD化するデータソン in 高槻
オープンデータをLOD化するデータソン in 高槻オープンデータをLOD化するデータソン in 高槻
オープンデータをLOD化するデータソン in 高槻
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
Json
JsonJson
Json
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON Structures
 
Json
JsonJson
Json
 
JSON: The Basics
JSON: The BasicsJSON: The Basics
JSON: The Basics
 
JSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked DataJSON-LD: JSON for Linked Data
JSON-LD: JSON for Linked Data
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
SPARQL Tutorial
SPARQL TutorialSPARQL Tutorial
SPARQL Tutorial
 
LODを使ってみよう!
LODを使ってみよう!LODを使ってみよう!
LODを使ってみよう!
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Little Big Data #1. 바닥부터 시작하는 데이터 인프라Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug ClassJava Deserialization Vulnerabilities - The Forgotten Bug Class
Java Deserialization Vulnerabilities - The Forgotten Bug Class
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
An Overview of Deserialization Vulnerabilities in the Java Virtual Machine (J...
 
ES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern JavascriptES2015 / ES6: Basics of modern Javascript
ES2015 / ES6: Basics of modern Javascript
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 

Mais de Chris Clarke

MEAN - Notes from the field (Full-Stack Development with Javascript)
MEAN - Notes from the field (Full-Stack Development with Javascript)MEAN - Notes from the field (Full-Stack Development with Javascript)
MEAN - Notes from the field (Full-Stack Development with Javascript)Chris Clarke
 
Programme Update, Talis Aspire User Group Jun 2011
Programme Update, Talis Aspire User Group Jun 2011Programme Update, Talis Aspire User Group Jun 2011
Programme Update, Talis Aspire User Group Jun 2011Chris Clarke
 
Linking Education Data
Linking Education DataLinking Education Data
Linking Education DataChris Clarke
 
Linked Open Courseware
Linked Open CoursewareLinked Open Courseware
Linked Open CoursewareChris Clarke
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationChris Clarke
 
A Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesA Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesChris Clarke
 
Aspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayAspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayChris Clarke
 
Aspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayAspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayChris Clarke
 
Bringing eContent to Life
Bringing eContent to LifeBringing eContent to Life
Bringing eContent to LifeChris Clarke
 
Xiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataXiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataChris Clarke
 

Mais de Chris Clarke (10)

MEAN - Notes from the field (Full-Stack Development with Javascript)
MEAN - Notes from the field (Full-Stack Development with Javascript)MEAN - Notes from the field (Full-Stack Development with Javascript)
MEAN - Notes from the field (Full-Stack Development with Javascript)
 
Programme Update, Talis Aspire User Group Jun 2011
Programme Update, Talis Aspire User Group Jun 2011Programme Update, Talis Aspire User Group Jun 2011
Programme Update, Talis Aspire User Group Jun 2011
 
Linking Education Data
Linking Education DataLinking Education Data
Linking Education Data
 
Linked Open Courseware
Linked Open CoursewareLinked Open Courseware
Linked Open Courseware
 
Using Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource RecommendationUsing Linked Data as the basis for Learning Resource Recommendation
Using Linked Data as the basis for Learning Resource Recommendation
 
A Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data PrinciplesA Resource List Management Tool based on Linked Open Data Principles
A Resource List Management Tool based on Linked Open Data Principles
 
Aspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th MayAspire Days Intro - Northumbria University 13th May
Aspire Days Intro - Northumbria University 13th May
 
Aspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th MayAspire Days Roadmap - Northumbria University 13th May
Aspire Days Roadmap - Northumbria University 13th May
 
Bringing eContent to Life
Bringing eContent to LifeBringing eContent to Life
Bringing eContent to Life
 
Xiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of dataXiphos Network: Building the scholarly web of data
Xiphos Network: Building the scholarly web of data
 

Using MongoDB as a high performance graph database

  • 2. Using MongoDB as a high performance graph database MongoDB UK, 20th June 2012 Chris Clarke CTO, Talis Education Limited Thursday, 21 June 12 Who is talis? Using mongo about 8 months (since 2.0) 5 months in production
  • 3. What this talk not about Thursday, 21 June 12 A blueprint for what you should do A pitch to encourage you to take our approach Providing or proving performance benchmarks Evangelism for the semantic web or linked data Encouraging you to contribute/download/use an open source project Optimised for your use case Although we can talk to you about any of the above (see me after)
  • 4. So, what is this talk about? Thursday, 21 June 12 Our journey of using MongoDB as a high performance graph database Specifically the software wrapper we implemented on top of Mongo to give us a leg up in terms of scalability and performance To give you some ideas for how to work with graph data models if you’d like to use document databases
  • 5. GRAPHS 101 Thursday, 21 June 12 Apologies Nodes and edges or Resources and properties Really easy to represents facts
  • 6. John knows Jane John knows Jane Thursday, 21 June 12 Ball and stick diagrams This is an undirected graph. It implies that John knows Jane and Jane knows John. The property has no directional significance.
  • 7. John knows Jane Jane knows John John knows Jane Thursday, 21 June 12 This is an undirected graph. It implies that John knows Jane and Jane knows John. The property has no directional significance.
  • 8. John knows Jane Jane ? John John knows Jane Thursday, 21 June 12 This is a directed graph. The relationship is one way. To add Jane knows John we need a second property. We will only use directed graphs from herein as they are more specific
  • 9. John knows Jane Jane knows John knows John Jane knows Thursday, 21 June 12
  • 10. Triples + RDF 101 Thursday, 21 June 12
  • 11. Subject Property Object John knows Jane Thursday, 21 June 12 This is a triple Property = predicate
  • 12. Subject Property Object John knows Jane Jane knows John Thursday, 21 June 12 This is a second triple The same resource can be a subject or an object
  • 13. Subject Property Object http://example.com/John http://xmlns.com/foaf/0.1/knows http://example.com/Jane Thursday, 21 June 12 RDF Resources and properties as URIs URIs can be dereferenced Can share common property descriptions (RDF Schemas) Here using FOAF - billions if not trillions of triples defined using FOAF
  • 14. Subject Property Object http://example.com/John foaf:knows http://example.com/Jane http://example.com/John foaf:name “John” PREFIX foaf: <http://xmlns.com/foaf/0.1/> Thursday, 21 June 12 Namespaces for readability In RDF subjects are always uris But objects can be literals i.e. plain text Many RDF/graph databases allow you to further type literals as dates, numbers, etc.
  • 15. Subject Property Object http://example.com/John rdf:type foaf:Person http://example.com/John foaf:name “John” http://example.com/John foaf:knows http://example.com/Jane http://example.com/Jane rdf:type foaf:Person http://example.com/Jane foaf:name “Jane” http://example.com/Jane foaf:knows http://example.com/John PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> Thursday, 21 June 12 Here we type John and Jane as foaf:Person using rdf:type Note both John and Jane appear as subjects and resources This RDF graph represents six facts
  • 16. foaf:Person rdf:type rdf:type foaf:knows example:John example:Jane foaf:knows “John” “Jane” Thursday, 21 June 12 Here it is in ball and stick
  • 17. FFS! I can do that in two minutes in BSON Thursday, 21 June 12
  • 18. > db.people.find() { _id: ObjectID(‘123’), name: ‘John’ knows: [ObjectID(‘456’)] }, { _id: ObjectID(‘456’), name: ‘Jane’ knows: [ObjectID(‘123’)] } Thursday, 21 June 12 Yes, you can! Data only makes sense inside your db though
  • 20. Some useful stuff, using RDF Thursday, 21 June 12 Lets look at some reasons why we think RDF is good
  • 21. attribution Thursday, 21 June 12 This is the linked open data cloud Linked data is a way RDF published on the open web Search linked data TED to hear why Tim Burness Lee cares about this Each blob on this diagram represents an open, interlinked dataset. The lines between them represent the interlinking between data sets Billions of public “facts” and growing exponentially from sites such as BBC, governments, Last.fm, Wikipedia
  • 22. Merging data from different sources is really easy Thursday, 21 June 12 Because the format is subject, predicate, object the shape of RDF is always the same. Because schemas are public and widely shared the same properties are used all over the place. Really easy to use this data in your own app and remix
  • 23. Dataset A Dataset B example:John example:John rdf:type foaf:name “John” foaf:Person Thursday, 21 June 12
  • 24. Dataset A+B example:John rdf:type foaf:name “John” foaf:Person Thursday, 21 June 12 Really easy to merge graphs “Designed in” to the data format Lots of existing tooling to do this
  • 25. RDF query language: SPARQL Thursday, 21 June 12
  • 26. PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } ORDER BY ?name LIMIT 50 Thursday, 21 June 12 SPARQL is mega flexible. Lots of functions for grouping, walking graphs, pattern matching, inference, UNIONS, Geo extensions etc. etc. - all that shit. Most if not all of those datasets will have a SPARQL endpoint you can query
  • 27. SELECT Tabular DESCRIBE Graph ASK Boolean CONSTRUCT Graph Thursday, 21 June 12 4 main query types
  • 28. PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { FFS! That looks like SQL! ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } ORDER BY ?name LIMIT 50 Thursday, 21 June 12 Yes it does. The WHERE clause is basically doing a shit load of joins. I’ll come back to that.
  • 29. Application DB Triple store + (SQL or other) SPARQL Offline conversion process Thursday, 21 June 12 Most datasets on the LOD diagram don’t exist natively as Linked data and RDF. They are post-produced. Data not held natively - so conversion script - needs to be maintained and updated every time app schema changes Data not up to date (1 hour, 1 day, 1 month behind?)
  • 30. Our innovation: Native Linked Data Applications Thursday, 21 June 12 We started working on these applications back in 2008 They are natively linked data so solve the conversion+currency issue There is no other “format” or schema the data is stored in, it’s native RDF When you have no schema, and you can integrate data from elsewhere on the web, it’s addictive
  • 31. Our problem: FFS! For applications, we need humongous scale and performance Thursday, 21 June 12 Those applications becoming rather popular with our users... sub 50ms query time Modern web apps need speed and data scale Out-grown triple store and SPARQL SPARQL is very flexible and expressive. It’s also expensive SPARQL is great for data sets where the questions you can ask are limitless, but our applications need a data layer where speed is measured in single digit ms. Complex caching (w/Memcache) to achieve performance and scalability 90:10 read:write
  • 32. Tripod Thursday, 21 June 12 It’s a pod for our triples A triple store designed for applications and scalability Based on Mongo
  • 33. Functional requirements: • Order magnitude increase in perf/scale • Graph-orientated interface Non-functional requirements: • Strong community Thursday, 21 June 12 Existing code very graph orientated
  • 34. Core data format Tripod API Dealing with complex queries TripodTables Free text search Thursday, 21 June 12 Walk through Tripod looking at 5 areas
  • 35. { ‘http://example.com/John’ : { ‘http://purl.org/dc/elements/1.1/name’ : [ { value: ‘John’, type: ‘literal’ } ], ‘http://purl.org/dc/elements/1.1/knows’ : [ { value: ‘http://example.com/Jane’, type: ‘uri’ } ] }, ‘http://example.com/Jane’ : { ‘http://purl.org/dc/elements/1.1/name’ : [ { value: ‘Jane’, type: ‘literal’ } ], ‘http://purl.org/dc/elements/1.1/knows’ : [ { value: ‘http://example.com/John’, type: ‘uri’ }, { value: ‘http://example.com/James’, type: ‘uri’ } ] } } Thursday, 21 June 12 RDF/JSON - a serialisation of RDF in JSON Neither disk space efficient or readable full-formed properties not compatible with Mongo (dot notation) Even single values inside an array (problems for compound indexing)
  • 36. > db.CBD_people.find() { _id: ‘http://example.com/John’, ‘foaf:name’: {l: ‘John’}, ‘foaf:knows’: {u: ‘http://example.com/Jane’} }, { _id: ‘http://example.com/Jane’, ‘foaf:name’: {l: ‘Jane’}, ‘foaf:knows’: [ {u:‘http://example.com/John’}, {u:‘http://example.com/James’} ] } Thursday, 21 June 12 Same semantics 2 documents here Concise bound descriptions - all data known about a subject, one relationship deep One document per subject per collection, keyed (and thus enforced) by Subject URI Property names are namespaced CBD collections are deemed as read/write in Tripod
  • 37. class MongoGraph extends SimpleGraph { function add_tripod_array($tarray) function to_tripod_array($docId) } Thursday, 21 June 12 All of our app already uses SimpleGraph from a library called Moriarty (Google Code) Simple extension which can ingest/output the data format on prev slide
  • 38. Core data format Tripod API Dealing with complex queries TripodTables Free text search Thursday, 21 June 12 Walk through Tripod looking at 5 areas
  • 39. interface ITripod { public function select($query,$fields,$sortBy=null,$limit=null); public function describeResource($resource); public function describeResources(Array $resources); public function saveChanges($oldGraph, $newGraph); public function search($query); } Thursday, 21 June 12 Almost the same as our existing data access API onto generic triple store All of these methods return graphs, all are mega-simple queries on the CBD collections None of these methods support joins (WHERE clause in SPARQL)
  • 40. public function describeResource($resource) { $query = array(“_id”=>$resource); $bson = $this->getCollection()->findOne($query); $graph = new MongoGraph(); $graph->add_tripod_data($bson); return $graph; } Thursday, 21 June 12 These methods mega simple to implement as they translate to really simple Mongo Queries on the CBD collections returning single objects
  • 41. interface ITripod { public function select($query,$fields,$sortBy=null,$limit=null); public function describeResource($resource); public function describeResources(Array $resources); public function saveChanges($oldGraph, $newGraph); public function search($query); public function getViewForResource($resource,$viewType); public function getViewForResources(Array $resources,$viewType); public function getViews(Array $filter,$viewType); } Thursday, 21 June 12 Some extra methods to deal with complex queries involving joins
  • 42. Core data format Tripod API Dealing with complex queries TripodTables Free text search Thursday, 21 June 12 2 things we realised when looking at our applications
  • 43. DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? authorList ?author ?usedBy ?creator ?libraryNote ?publisher WHERE { OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } } Thursday, 21 June 12 Typical SPARQL query in our app 9 “joins” in this query
  • 44. DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ? authorList ?author ?usedBy ?creator ?libraryNote ?publisher WHERE { OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator } } Thursday, 21 June 12 Only thing that changes at run time in this query is this URI Flexibility of SPARQL great for developer but terrible here for system performance Query engine needs to join 9 times! Flexibility costs us every time we run this query! This is why we hid it behind a cache
  • 45. join count follow sequences (n times) join across databases All the above with a condition include certain properties include all properties Thursday, 21 June 12 2nd thing We only make use of minimal SPARQL And some of these aren’t even well supported in SPARQL (sequences + join across databases)
  • 46. Materialised views, generated infrequently, read often Thursday, 21 June 12 Remember 90:10 read:update View specifications based on a subset of SPARQL Views are for DESCRIBE like queries where all the data is brought back in one hit (not tabular data)
  • 47. { _id: "v_resource_brief", from: "CBD_harvest", type: "http://talisaspire.com/schema#Resource", include: ["rdf:type", "dct:subject", "dct:isVersionOf", "searchterms:usedAt", "dc:identifier"], joins: { "acorn:preferredMetadata": [], "acorn:listReferences": { include: ["acorn:list"] }, "acorn:bookmarkReferences": { include: ["acorn:bookmark"] }, "dcterms:isPartOf": [], "acorn:partReferences": { include: ["dct:hasPart"], joins: { "dct:hasPart": { joins: { "acorn:preferredMetadata": [] } } } } } } Thursday, 21 June 12 A view specification - itself a document that can be stored in Mongo 8 keywords: type from include joins ttl followSequence maxJoins counts
  • 48. Generated by incremental MapReduce when: 1) Data is changed 2) TTL expires Thursday, 21 June 12 Tripod can take these specifications and manage views in a special collection within the DB. They expire and are regenerated automatically (and incrementally) Incremental map reduce inside the DB Fast, interleaves with reads
  • 49. > db.views.findOne() { "_id" : { "rdf:resource" : "http://talisaspire.com/examples/1", "type" : "v_resource_full" }, "value" : { "graphs" : [ { "_id" : "http://talisaspire.com/examples/1", "rdf:type" : { "type" : "uri", "value" : "http://talisaspire.com/schema#Resource" } } ], "impactIndex" : [ "rdf:resource" : "http://talisaspire.com/examples/1" ] } } Thursday, 21 June 12 This is what a view looks like ID is a composite key of the view type and root resource Graphs is a collection of CBDs MongoGraph we displayed earlier can take this and represent it as a unified graph to the application Impact index - A watch list of resources. When resources are saved the impact index is queried to find views that need invalidating TTL is an alternative. If in viewspec timestamp is stored in view to determine when it can be invalidated
  • 50. 1 2 3 4 attribution Thursday, 21 June 12 Match views to data update rate
  • 51. Core data format Tripod API Dealing with complex queries TripodTables Free text search Thursday, 21 June 12 Tripod Tables are for larger datasets which cannot be brought back in one hit They can be paged or have individual columns indexed for fast sort capability
  • 52. SELECT ?listName ?listUri! WHERE { ! ?resource bibo:isbn10 "$isbn" ! UNION ! { ! ! ?resource bibo:isbn10 "$isbnLowerCase" . ! } ! ?item resource:resource ?resource . ! UNION ! { ! ! ?resourcePartOf bibo:isbn10 "$isbn" . ! ! UNION ! ! { ! ! ! ?resourcePartOf bibo:isbn10 "$isbnLowerCase" . ! ! } ! ! ?resourcePartOf dct:hasPart ?resource . ! ! ?item resource:resource ?resource . } ?listUri resource:contains ?item . ?listUri sioc:name ?listName . ?listUri rdf:type resource:List } LIMIT 10 OFFSET 40 Thursday, 21 June 12 This is a select query that brings back a two col document OFFSET LIMIT
  • 53. <?xml version="1.0"?> <sparql xmlns="http://www.w3.org/2005/sparql-results#"> ! <head> ! ! <variable name="label"/> ! ! <variable name="type"/> ! </head> ! <results> ! ! <result> ! ! ! <binding name="label"> ! ! ! ! <literal>Tropical grassland</literal> ! ! ! </binding> ! ! ! <binding name="type"> ! ! ! ! <uri>http://purl.org/ontology/wo/TerrestrialHabitat</uri> ! ! ! </binding> ! ! </result> ! ! <result> ! ! ! <binding name="label"> ! ! ! ! <literal>Grassy field</literal> ! ! ! </binding> ! ! ! <binding name="type"> ! ! ! ! <uri>http://purl.org/ontology/wo/TerrestrialHabitat</uri> ! ! ! </binding> ! ! </result> ! </results> </sparql> Thursday, 21 June 12 SPARQL SELECT results - tabular format - here in XML
  • 54. > db.t_resource.findOne() { "_id" : "http://talisaspire.com/resources/3SplCtWGPqEyXcDiyhHQpA-2", "value" : { "type" : [ "http://purl.org/ontology/bibo/Book", "http://talisaspire.com/schema#Resource" ], "isbn" : "9780393929690", "isbn13" : [ "9780393929691", "9780393929691-2", ! "9780393929691-3" ], "impactIndex" : [ "http://talisaspire.com/works/4d101f63c10a6", ] } } Thursday, 21 June 12 This time our map reduce doesn’t create one doc as with materialised views We get one doc per row
  • 55. Core data format Tripod API Dealing with complex queries TripodTables Free text search Thursday, 21 June 12 Our triple store included free text search We wanted to stream updates into Elastic Search or A N Other search solution When documents saved, same specification language used to build Search Document Format docs and submit them to an endpoint We like ElasticSearch but you could use Amazon CloudSearch
  • 56. Limitations Thursday, 21 June 12 Map Reduce as a non-blocking db.eval() and also to work around sync PHP programming model PHP only for now - our web apps were PHP To get a SPARQL endpoint we are exporting data out to Fueski - solved the mapping not the currency (for SPARQL)
  • 57. Future Thursday, 21 June 12 Node JS port Use as a server not a library Eliminate dependancy on map reduce Specification version control Tap into op log for stream approach into Fuseki and other locations Named graph support Further optimisation of data model Maybe open source
  • 59. Questions? Find us on: Web: talisaspire.com Twitter: @talisaspire YouTube: youtube.com/user/TalisAspire Facebook: facebook.com/talisaspire Support: support.talisaspire.com Thursday, 21 June 12
  • 60. Find us on: Web: talisaspire.com Twitter: @talisaspire YouTube: youtube.com/user/TalisAspire Facebook: facebook.com/talisaspire Support: support.talisaspire.com Thursday, 21 June 12