31. import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
sem:sparql('
SELECT ?country
WHERE {
<http://example.org/news/Nixon> <http://example.org/wentTo> ?country
}
',
(),
(),
cts:and-query( (
cts:path-range-query( "//sem:triple/@confidence", ">", 80) ,
cts:path-range-query( "//sem:triple/@date", "<", xs:date("1974-01-01")),
cts:or-query( (
cts:element-value-query( xs:QName("source"), "AP Newswire" ),
cts:element-value-query( xs:QName("source"), "BBC" )
) )
) )
)
Which countries did Nixon visit?
.. before 1974?
.. only show me answers where I have at least 80% confidence
.. and the source is AP Newswire OR BBC
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
This analogy breaks down later in the talk – as if we took a spoon and mixed triples and documents.
This analogy breaks down later in the talk – as if we took a spoon and mixed triples and documents.
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
Data Model is XMLLanguage is XQuery
MarkLogic is a Major Player in the Search world – we expect to be a Major Player in the Semantics world.
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
The BBC has created what they’ve called “Dynamic Semantic Publishing”, that combines documents and semantics to let them achieve great things in their Olympics coverage . But interest in semantics – and the combination of documents and semantics – goes across all verticals. Triples are the most granular way to store information, which makes them very simple to manage and combine – you have a bucket of facts, and when you find more facts you just throw them in the bucket. They’re a natural choice for many kinds of metadata and real-world facts, and since there are a bunch of standards around triples it’s easy to share data – the semantic web (and Open Linked Data Web) mean there are lots of triples around – we’ve created this demo using only documents from MarkMail and facts that are available out on the webReal-world facts from the Open Data Web: Events, foaf, GeonamesSources:http://www.ontotext.com/publishing Jem's blog posts: http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_news_redesign_telling_the.htmlhttp://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dynamic_semantic.htmlJem's slides, including REST API and Ontologieshttp://www.slideshare.net/JemRayfield/dsp-bbcjem-rayfieldsemtech2011Jem's slides from MarkLogic World 2012http://speakerdeck.com/u/jemrayfield/p/marklogic-world-2012-bbc-dsp-jem-rayfield
How have we extended our Enterprise NoSQL approach with Semantics?At the STORAGE LAYER we are capable of scaling up and have all the capabilities you know to be “Enterprise” including but not limited to replication, failover, backup and recovery, transactional support and more. Those same enterprise features are available for the Triple Store. As you know, this is very different from open source approaches.The next layer are all of our INDEXES. Some of those indexes are for full text search, range indexes, geospatial indexes and reverse query indexes for alerting. Now we have added a TRIPLES INDEX and a TRIPLES CACHE to store triples in memory for efficient retrieval. A few points here worth noting:All of our indexes are designed to work together allowing organizations to do look-ups against any combination of them.We store and index our triples in 3 orders which allow us to optimize queries that use the triples. (How fast is this?)Our triples cache is a different feature that other triple stores don’t have. With MarkLogic, it’s not necessary to have all of your Triples in memory. They can be swapped in and out of the cache. Users are not constrained by physical memory limits. We optimize performance with our index and cache.At the Query Layer, we have added native SPARQL support. User can compose queries is SPARQL only or powerful combination queries where part of the query could be written in SPARQL and run against the Triple Store while another part of the query could be XQUERY or XSLT and query documents. (SQL can be used to query data.) It’s this powerful combination of results – documents, data and facts that are returned and can be used at the application layer - that’s groundbreaking. At the server layer we’ve added REST APIs to do SPARQL and GRAPH operations allowing you to build SPARQL endpoints, for example. These end points represent the addresses where the are sets of triples are returned from a SPARQL Query that can be used in search applications. But again, the combination of documents, facts and data, all presented “in context” as part of search results is what makes this so powerful. No one has this approach in an Enterprise grade technology. By the way, the MarkLogic Content Pump has been extended do to bulk loading of triples. Any questions?
Why does anyone use Semantic Technologies?Triples are the ultimate in schemaless – each triple represents just one atomic fact.It's very easy to combine sets of triples from different sources and query across them – you just drop all the triples into the same bucket.Triples are a natural choice for some kinds of facts, wuch as metadata and real-world facts (the capital of Qatar is Doha).Because RDF and SPARQL are standards, there tons of tools and skills and data sets out there.The Open Linked Data Web is the basis for the Semantic Web vision – you can find tons of data freely available on the web, mostly in RDF – everything from dbpedia (the triples version of wikipedia - http://dbpedia.org/About) to the CIA World Factbook (http://data-gov.tw.rpi.edu/wiki/CIA_World_Factbook) to data about drugs and clinical trials (http://www.w3.org/wiki/HCLSIG/LODD/Data).It's trivial to pull in billions of facts and combine them with your own facts, metadata, and facts-derived-from-docuemnts.
Why does anyone use Semantic Technologies?Triples are the ultimate in schemaless – each triple represents just one atomic fact.It's very easy to combine sets of triples from different sources and query across them – you just drop all the triples into the same bucket.Triples are a natural choice for some kinds of facts, wuch as metadata and real-world facts (the capital of Qatar is Doha).Because RDF and SPARQL are standards, there tons of tools and skills and data sets out there.The Open Linked Data Web is the basis for the Semantic Web vision – you can find tons of data freely available on the web, mostly in RDF – everything from dbpedia (the triples version of wikipedia - http://dbpedia.org/About) to the CIA World Factbook (http://data-gov.tw.rpi.edu/wiki/CIA_World_Factbook) to data about drugs and clinical trials (http://www.w3.org/wiki/HCLSIG/LODD/Data).It's trivial to pull in billions of facts and combine them with your own facts, metadata, and facts-derived-from-docuemnts.
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
1st point – We have a rich search application – there’s a lot of information2nd point – But we can we can make a rich search application even richer – The application knows about sources of data for Hadoop. It reaches out to those sources and dynamically retrieves the information wrapping even more context about your search.
Cf. Google Knowledge Graph
And since we have facts stored about the people, when we search for people affiliated with IBM, even though the email address does not have IBM in the email domain, because we have a fact stored about Sam Ruby (he works at IBM), this is what is returned in the search. Our semantic Application was informed by facts.Cf Bob duCharme's talk at semtechbiz 2013 San Francisico
The entity extraction vendors are getting very good – not only can they pull entities out of free-flowing text, they can pull out facts ("events").
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
[SB: Need something here that says "and if you're only interested in Triples, now you can embed them in documents"]
.. if you're only interested in Triples, now you can embed them in documents
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.
I'm Stephen Buxton, Director of Product Management for Search and Semantics at MarkLogic.