The rapidly increasing amount of semantic network data today provides a wealth of insight into how entities interact and relate with one another. In order to tap into this valuable source of information, organizations require a secure and scalable repository in which to store and explore these interactions and relationships. In this talk we will discuss Apache Rya, an Accumulo-based graph store capable of storing billions of Resource Description Framework (RDF) triples and providing a rich SPARQL query endpoint for exploring complex subgraph relationships. We will talk about two indexing strategies that Rya uses to address some of the challenges associated with storing and querying large graph datasets. In particular, we will discuss how our SPARQL (SPARQL Protocol and RDF Query Language) query caching framework allows users to greatly improve query performance by storing and incrementally maintaining query results using Apache Fluo. We will also discuss our Accumulo-based entity centric index. Inspired by Facebook’s horizontally partitioned graph index, Unicorn , Apache Rya’s entity centric index is a novel way of storing graphs in Accumulo that draws on document partitioned indexing techniques. This graph partitioning and indexing strategy limits network traffic and enables distributed join processing by utilizing a variation of Accumulo’s IntersectingIterator framework to perform joins server side.
The work presented herein was funded by the Office of Naval Research, under contract # N00014-12-C-0365, supporting this effort.
– Speaker –
Dr. Caleb Meier
Software Engineer, Parsons
Caleb Meier has been a Software Engineer at Parsons Government Services for the last two years. Since joining Parsons, he has investigated and implemented a number of features to improve the query performance of Apache Rya. Caleb earned his Ph.D. in Mathematics from the University of California, San Diego and a B.A. in Mathematics from Yale University. In his spare time he enjoys climbing, biking, playing soccer and spending time with his delightful wife Leslie.
— More Information —
For more information see http://www.accumulosummit.com/
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Networks
1. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Rya: Accumulo Indexing
Strategies for Searching
Semantic Networks
Dr. Caleb Meier, Puja Valiyil, David Lotts, Aaron Mihalik,
Dr. Adina Crainiceanu
00.00.00
Presenter’s NameDISTRIBUTION STATEMENT A. Approved for
public release; distribution is unlimited.
ONR Case Number 43-2117-16 EXIM APPROVED Parsons #459 8 OCT 16
2. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Acknowledgements
• The work presented herein was funded by the Office of
Naval Research (ONR) and the National Geospatial-
Intelligence Agency (NGA) under contract # N00014-12-C-
0365
• This presentation was sponsored by Parsons
• This work is the collective effort of:
Parsons’ Rya Team: Puja Valiyil, Aaron Mihalik, Caleb
Meier, David Lotts, Jennifer Brown
Rya Founders: Roshan Punnoose, Adina Crainiceanu,
and David Rapp
1
3. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 2
Agenda
• Rya Background
• Materialized Views in Rya
• Entity-Centric Index
4. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 3
Agenda
•Rya
Background
• Materialized Views in Rya
• Entity-Centric Index
5. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Apache Rya (incubating): Resource Description
Framework (RDF) Triplestore built on top of
Accumulo or MongoDB
• RDF: W3C standard for representing
linked/graph data
Represents data as statements (assertions) about
resources
Serialized as triples in {subject, predicate, object} form
Example:
{Caleb, worksAt, Parsons}
{Caleb, livesIn, Virginia}
4
Rya and RDF
Background
Caleb
Parsons
Virginia
worksAt
livesIn
6. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 5
SPARQL
Background
SELECT ?people WHERE {
?people <worksAt> <Parsons>.
?people <livesIn> <Virginia>.
}
• RDF Queries are described using SPARQL
SPARQL Protocol and RDF Query Language
• SQL-like syntax for finding triples matching
specific patterns
Look for subgraphs that match triple statement patterns
Joins are performed when there are variables common to two or
more statement patterns
7. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• RDF4J* Interface for interacting with RDF data
stored on Accumulo
RDF4J Open Source Java
framework for storing and
querying RDF data
RDF4J provides several
interfaces/abstractions
central for interacting with
an RDF datastore
SAIL interface for
interacting with underlying persisted RDF model
SAIL: Storage And Inference Layer
6
Rya Architecture
Background
Data storage layer
Query processing in SAIL layer
SPARQL
Rya and RDF4J
Rya QueryPlanner
Accumulo
*The RDF4J.org project was previously named “Open RDF”, then “Sesame”.
8. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• 3 Tables
SPO : subject, predicate, object
POS : predicate, object, subject
OSP : object, subject, predicate
• Store triples in the Row ID of the table
• Store graph name (context) in the Column Family
• Advantages:
Native lexicographical sorting of row keys fast range queries
All patterns can be translated into a scan of one of these tables
7
Storage: Triple Table Index
Accumulo Composite Index and Table Design
Key for SPO Table
Row ID
Column Timestam
pFamily Qualifier Visibility
subject, predicate, object,
type
graph name (not used) visibility timestamp
9. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Subject, Predicate,
Object
…
Bob, livesIn, California
…
Greta, livesIn, Virginia
…
John, livesIn, Virginia
…
Anatomy of a RYA SPARQL Query
Query Planning
Step 2: For each ?x, SPO
Table lookup
Subject, Predicate, Object
…
Greta, commuteMethod,
bike
…
John, commuteMethod, Bus
…
Step 3: For each ?x, SPO
Table lookup
Step 1: POS Table – scan
range for worksAt
?x worksAt ?y ?x livesIn Virginia
?x commuteMethod bike
SELECT ?x, ?y WHERE {
?x <worksAt> ?y.
?x <livesIn> Virginia.
?x <commuteMethod> bike.
}
Predicate, Object,
Subject
…
studiesAt, Joe,
Georgetown
talksTo, Joe, Bob
worksAt, Netflix, Bob
worksAt, Parsons, Greta
worksAt, PlayStation,
John
10. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Out of the box, Rya SPARQL query evaluation requires
joining Accumulo scan results client side
• Joins are performed using a hybrid of nested loop and
hash joins
Range prefixes for scans are determined by results of previous
scan
Sorted nested loop
Results of scan are joined with any values not used to form range
prefix
Hash joins
• BatchScanner boosts performance
Evaluates results in batches
Client side join evaluation is still a bottle neck
Especially for queries with:
Large intermediate join results
Large number of joins
9
Costly Joins of Accumulo Scans
Query Evaluation
11. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Materialized Views in Rya: Index Frequently Issued Queries or Frequent
Subgraphs
1. Pre-compute a portion of the query
2. Store SPARQL describing the query along with pre-computed values in
Accumulo
3. Normalize query variables to match stored SPARQL variables during query
execution
• Entity-Centric Index: Add new indices to eliminate need for hash joins
1. Apply Document Partitioned Indexing to graph data
Design tables so that all properties for each entity appear on single tablet
2. Find entities with intersecting properties
Use a variation of an Intersecting Iterator
Perform merge joins on the server
• Additional indexing strategies could eliminate the need for client side
joins
Clear trade off between query performance and memory footprint
Mitigate “data plume”
10
Indexing Strategies for Dealing With Costly Joins
Query Evaluation
12. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 11
Agenda
• Rya Background
•Materialized
Views in Rya
• Entity-Centric Index
13. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Determine which query results to cache given that
capacity is limited
• Caching Strategies
Preference: manually submit queries to cache
Recent use
Complexity: number of joins, estimates of intermediate results
Relevance
Relevance to data (frequent subgraphs)
Relevance to users (query logs)
• A comprehensive strategy should use all of the
criteria above
• Terms: Materialized Views and Precomputed Joins
A Materialized View is a cache of query results
Precomputed Joins in Rya are Materialized Views for SPARQL
queries consisting of joins and filters
12
Determining which Queries to Cache
Materialized Views in Rya
14. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• During query planning, statement patterns in a query are grouped
according to common variables and common constants
• Groups can be precomputed and cached, trading faster response times
for increased storage
13
Query Planning: Anatomy of a Rya SPARQL Query using a Precomputed Join
Materialized Views in Rya
?x :livesIn :Arlington
?y :talksTo ?x
?y :livesIn :DC.
?x :commutesBy :Bike
Pre-Computed Join Index
table
Select ?a, ?b where {
?a :livesIn :DC . ?a :talksTo
?b .}
?a=Joe, ?b=Caleb
?a=Mike, ?b=Dave
…
?a=Rob, ?b=Aaron
Pre-
Computed
Join Index
Node
Join
Join
Join
select ?x ?y where { ?y :livesIn :DC. ?y :talksTo ?x.
?x :livesIn :Arlington. ?x :commutesBy :Bike.}
15. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Batch Update
Re-compute precomputed join tables periodically using MapReduce
Benefit:
Minimizes data plume
Drawback:
Large possibility of stale data
Query plans that use precomputed joins may contain inaccurate results
• Incrementally update tables as triples are ingested
Use some sort of observer framework to update intermediate results as
new triples are ingested
Benefit:
No staleness in query results
Query plans that use precomputed joins are more accurate
Significantly less latency for updates
Drawbacks:
Data plume
Intermediate query results have to be stored to incrementally update
results
Observer framework increases the complexity of the system
14
Batch Updates
Strategies for Maintaining Materialized Views
16. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 15
Incremental Transactions Using Apache Fluo (incubating)
Fluo Background
• Address maintenance problem by incrementally
updating cached results using Fluo
Fluo was created for Accumulo based on the Percolator paper1
by Google Inc.
Fluo provides additional features that Accumulo does not:
Multi-row transactions prevent write-write conflicts
Observer framework (next slide)
• Use cases
Maintain large scale computation using series of small
transaction updates
Join existing large data cache with new data
Formerly done by periodic batch processing jobs recreating the data
cache
1. Daniel Peng, Frank Dabek. USENIX. 2010. Large-scale Incremental Processing Using Distributed Transactions and
Notifications. http://research.google.com/pubs/pub36726.html
17. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 16
Creating an Application using the Observer Framework
Fluo Background
• Overview of the Fluo Observer Framework
Observers monitor a given Fluo table column
When the observed column is updated, a notification is
triggered which tells the observer to perform a
transaction
The transaction is specified by the implementation of
the observer’s process method
Takes in a transaction object, row and column
Uses the data to perform an action such as writing to
another column in the table
• Perform complex incremental updates by
Chaining Observers
Decompose problem into a collection of interacting observers
Observers can write notifications that trigger other observers
18. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 17
Join Observer:
Output: {?patron, ?employee,
?business}
Statement Pattern
Observer:
?patron <http://talksTo> ?employee
Output: {?patron, ?employee}
Statement Pattern Observer:
?employee <http://worksAt> ?business
Output: {?employee, ?business}
People who talk to employees and
where that employee works.
Pairs of people who talk to each
other
Where people
work
Streamed
Statements
Streamed
Statements
SPARQL Query
SELECT ?patron ?employee ?business
WHERE {
?patron <http://talksTo> ?employee.
?employee <http://worksAt> ?business
}
Formulating a SPARQL Query as a Chain of Observers
Maintaining Precomputed Joins
23. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
PCJ
Client
PCJ
Index
Core Rya
Tables
Fluo App
Registering a New Query
Maintaining Precomputed Joins
1. Register Query
2. Scan for historic
Statement Pattern matches
4. Compute Results
5. Export Results to PCJ Index
22
3. Insert Statement
Pattern matches
24. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
PCJ
Client
Statement
Stream
Core Rya
Tables
Fluo App
Streaming While Registering Query
Maintaining Precomputed Joins
2. Scan for historic
Statement Pattern matches
3. Insert Statement Pattern
matches
4. Compute Results
1. Register Query
A. Write new Statement
B. Write new Statement
23
25. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• The results on this slide and the next were obtained using proprietary queries and data
Q1 is the most complex and consists of 14 joins, 6 of which are left joins
Q2 obtained from Q1 by removing two left joins and two joins
Q3-Q5 decrease in complexity as well and are obtained using a similar process
Q5 is similar to Q1, with all left joins replaced by joins
• 4192 results were obtained by querying a Rya Instance with 500,000 triples installed
on Parsons’ internal cluster
8 worker nodes, each with 2 x 6 Core Xeon E5-2440 (2.4GHz) Processors and 48 GB RAM
• Table below presents results with average query time over 10 iterations with standard
deviation:
24
Benchmark results for Rya with No Precomputed Joins and No Optimizations
Materialized Views in Rya
Q Rya with one exact PCJ (s) Rya with no PCJ (s)
Q1 1.284 ± 0.047 516.774 ± 6.265
Q2 0.851 ± 0.042 345.606 ± 5.991
Q3 0.598 ± 0.026 180.663 ± 3.354
Q4 0.368 ± 0.026 63.588 ± 1.527
Q5 1.334 ± 0.074 97.101 ± 1.765
26. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 25
Agenda
• Rya Background
• Materialized Views in Rya
•Entity-Centric
Index
27. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 26
Facebook’s Unicorn Paper: Motivating Example
Entity-Centric Index
The problem is reduced to finding the intersection of lists:
How do I find all documents containing “dog” and “bark”?
1. View docs and terms as a graph, with edges drawn from docs to the
terms they contain
2. Efficiently represent graph as a collection of adjacency lists
bark doc4 doc5dog doc6
doc1
doc4
doc2
doc5
doc7
dog
bark
doc3 doc8
doc6
dog
bark
doc1 doc2 doc3
doc4 doc5 doc6
doc4 doc5 doc6
doc7 doc8
Adjacency lists of dog and bark
28. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
What if the adjacency lists are really large? The terms
“dog” and “barks” could appear in lots of documents!
• Distribute the problem by partitioning adjacency lists of
documents across servers
Involves some type of sharding
• Each server finds intersections of smaller lists:
27
Facebook’s Unicorn Paper: Distributing the Problem
Entity-Centric Index
dog
bark
dog
bark
1 2 3
4 5 6
4 5 …
7 8 …
Server 1
ShardID = 0
Server 2
ShardID = 1
ShardID = (doc num)%3
Server 3
ShardID = 2
3 6 ...
6 ...
1 4 ...
4 7 ...
2 5 ...
5 8 ...
dog
dog
bark
bark
29. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 28
Unicorn Applied to Accumulo
Entity-Centric Index
Accumulo Key
Row: doc shard Column
CF: term CQ: document id
• Unicorn Framework outlines the basis for a distributed
document partitioned index
• Accumulo has a framework1 in place for creating this index
Uses IndexedDocIterator which is an extension of an
IntersectingIterator
• Uses the following key design:
1. Accumulo: Application Development, Table Design, and Best Practices, Cordova A., Rinaldi B.,
Wall M., O’Reilly 2015
30. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 29
RowID ColF ColQ
0 bark 6
0 dog 3
0 dog 6
RowID ColF ColQ
1 bark 1
1 bark 4
1 dog 4
1 dog 7
RowID ColF ColQ
2 bark 2
2 bark 5
2 dog 5
2 dog 8
Server 1
Server 2 Server 3
Documents that
contain dog and bark
Iter1
Iter2
Iter1
Iter2
Iter1
Iter2
Q
Q Q
R:6 R:4
R:5
Elements in adjacency lists of “bark” and
“dog” stored in Accumulo in a Document
Partitioned Index
• RowID = shardID (doc num % 3)
• Column Family = term (bark or dog)
• Column Qualifier = adjacency
element (document number)
Using this index, can evaluate “entity-
centric queries” entirely on server
• On each server,
• iter1 scans “bark”
• iter2 scans “dog”
• Iterators intersect when colQ1 =
colQ2, then return result
Unicorn Implemented in Accumulo using Intersecting Iterators
Entity-Centric Index
31. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 30
Unicorn Applied to RDF
Entity-Centric Index
• Adjacency Lists capture the edge label as well as the
connection
pers1 pers2 pers3
dog SUV
pers4
hasPet
obj/dog
employs
subj/USGovt
pers1 pers2 pers4
pers2 pers4
Adjacency lists of SUV and USGovt and do
• This SPARQL query asks for all people who own a dog,
drive a SUV, and work for the U.S. Government:
SELECT ?person WHERE { ?person <hasPet> <dog> .
?person <drives> <SUV> .
<USGovt> <employs> ?person . }
drives
hasPet
employs
USGovt
employs
drives
drives
drives
obj/SUV
pers2 pers3 pers4
32. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• For each triple (subj, pred, obj, graph-name), insert the following two
Accumulo keys in the same Entity-Centric Index table:
31
Key Design in Accumulo
Entity-Centric Index
Row: uri:John, CF: uri:worksAt, CQ: parsonsEmployeesx00objectx00uri:Parsons
Row: uri:Parsons, CF: uri:worksAt, CQ: parsonsEmployeesx00subjectx00uri:John
Accumulo Key
Row:<subject
>
Column
CF:<predicate
>
CQ:<graphName>x00objectx00<object>
Accumulo Key
Row:<object> Column
CF:<predicate>
CQ:<graphName>x00subjectx00<subject
>
• The triple (uri:John, uri:worksAt, uri:Parsons, graph context:
parsonEmployees) is added as the following two rows:
33. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 32
Query Evaluation: Merge Joins and the Reduction of Network Traffic
Entity-Centric Index
SELECT ?person WHERE {
?person <hasPet> <dog>.
?person <drives> <SUV>.
<USGovt> <employs> ?person.}
Iter2
Q Q
R:pers2 R:pers4
Iter1
Iter3
Iter1
Iter3
Iter2
Using this index, can evaluate “entity-centric queries” entirely on server
• iter1 scans col: employs colQ: subject USGovt,
• iter2 scans col: drives colQ: object SUV,
• iter3 scans col: hasPet colQ: object dog
• Iterators intersect when rowID 1 = rowID 2 = rowID 3
RowI
D
ColF ColQ
dog hasPet S ….
pers1 hasPet O dog
pers2 employ
s
S USGovt
pers2 drives O bicycle
pers2 drives O SUV
pers2 hasPet O dog
RowI
D
ColF ColQ
pers3 drives 0 SUV
pers4 employ
s
S USGovt
pers4 drives O SUV
pers4 hasPet O dog
pers5 drives O SUV
SUV drives S ….
USGov
t
employ
s
O pers2
USGov
t
employ
s
O pers4
Server 1
Server 2
34. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 33
Which Queries Can We Evaluate?
Entity-Centric Index
• Generalize Document Partitioned Index to accommodate a broad
range of SPARQL queries
• Solve as many Entity-Centric queries server side as possible
Entity-Centric means all statement patterns share a common variable or constant
33
select ?x ?y ?z
where{
A aa ?x
A bb ?y
A cc ?z
}
select ?x
where{
?x aa C
?x bb B
?x cc D
}
select ?x ?y ?z
where{
B aa ?x
?x bb ?y
?x cc ?z
}
B C
D
?x
Entity with Properties
?x ?y
?z
A
Properties for an Entity
B ?y
?z
?x
“Friends of Friends”
aa
bb
cc
aa
bb
cc
aa
bb
cc
35. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• During query planning, group statement patterns according to
common variables and common constants
• Groups which have the highest “priority” are consolidated into an
Entity-Centric Index node
34
Query Planning: Anatomy of a Rya SPARQL Query using the Entity-Centric Index
Entity-Centric Index
?x livesIn Arlington
?y talksTo ?x
?y livesIn D.C.
?x commutesBy Bike
Entity-Centric Index
…
Joe, livesIn, D.C.
Joe, talksTo, Rob
…
Rob, commutesBy, Bike
Rob, livesIn, Arlington
…
Entity-
Centric
Index
Node
Entity-Centric Index Node
1
2
Join
Join
Join
36. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 35
Entity Centric Benchmarking Results
Entity-Centric Index
• Results were obtained by running 13 queries (see Appendix) against the Lehigh
University Benchmark (LUBM) data set consisting of 33.34 million triples
• The Entity Centric Index table was split into 19 tablets distributed across 8
servers
All predicates found in LUBM data were set as locality groups for the Entity Centric Index table
• Queries were issued using a BatchScanner with 15 threads
Query Entity Total Time (s) Rya Total Time (s) Results Ret.
LUBMStar Q1 23.6 624.702 1024789
LUBMStar Q2 0.3724 0.732 7
LUBMStar Q3 0.545 1.221 499
LUBMStar Q4 4.37 379.239 180002
LUBMStar Q5 1.475 6.072 40665
LUBMStar Q6 0.222 6.613 5003
LUBMStar Q7 11 0.3258 3
LUBMStar Q8 7.2 0.267 1
LUBMStar Q9 12.763 0.748 8
LUBMStar Q10 34.934 1929.984 1,259,374
LUBMStar Q11 0.0412 0.284 3
LUBMStar Q12 0.0358 0.311 2
LUBMStar Q13 0.0291 0.137 30
37. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Next Steps:
Implement strategies to determine PCJs dynamically
Calculate frequent subgraphs
Develop query logging framework
Perform semantic analysis of queries to determine common
components to cache
Streamline query planning with respect to PCJs
Query planning time increases as number of PCJs increases
Explore strategies for pruning PCJ query plan search space
to quickly determine efficient PCJ combinations for query
plans
Index PCJs using underlying query components so that PCJs
can be efficiently discovered using the matching subquery
36
Next Steps for Precomputed Joins
Future Research
38. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Next Steps for Entity Centric Index
Explore ways to improve join performance of Entity Centric Query
Nodes
Add capability to explicitly define Entity Types for Index
Entities implicitly defined as nodes containing specified combination of
properties
Explicitly register entities with index and allow users to query by type
Specify entities using OWL (Web Ontology Language) class and
property combinations
Leverage additional structure using more targeted queries involving
identifying features for the give entity type
• Future Research in Server Side Join Evaluation
Utilize Spark GraphX or Spark DataFrames to create a distributed
query evaluation framework for Rya
Joins performed on Rya Resilient Distributed Datasets (RDDs) in
remote SparkContext on Server
37
Next Steps for Entity Centric Index and Server Side Join Evaluation
Future Research
40. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 39
• Useful Links
• Entity Centric LUBM Star Queries
Appendix
41. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Useful Links
SPARQL
Standard -- http://www.w3.org/TR/rdf-sparql-query/
Tutorial -- http://jena.apache.org/tutorials/sparql.html
RDF
Primer -- http://www.w3.org/TR/rdf11-primer/
Unicorn
Paper -- Michael Curtiss, et al., Unicorn: A System for Searching the Social Graph, Facebook
Inc. https://research.facebook.com/publications/unicorn-a-system-for-searching-the-social-
graph/
Apache Rya (Incubating)
Home -- http://rya.apache.org/ Home page for Apache Rya (Incubating)
Rya Office Hours -- Biweekly phone conference. Updates, issues, upcoming features.
Up-coming announcements with dial-in numbers are sent on the dev mailing list
Mailing List -- dev@rya.incubator.apache.org is for usage questions, help, and
people who want to contribute code to Rya. subscribe, unsubscribe, archives
Javadoc OpenRDF=Sesame=RDF4J -- http://archive.rdf4j.org/javadoc/sesame-2.7.16/
Tutorial for RDF4J -- http://rdf4j.org/doc/programming-with-rdf4j/
Paper -- Punnoose R., Crainiceanu A., Rapp D. 2012. Rya: a scalable RDF triple
store for the clouds. Proceedings of the 1st International Workshop on Cloud
Intelligence. http://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
Paper -- Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
Using Rya. Information Systems Journal (2013).
http://www.usna.edu/Users/cs/adina/research/Rya_ISjournal2013.pdf 40
42. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Entity Centric LUBM Star Queries (1 of 5)
The Entity Centric index was tested by issuing the following queries against the LUBM data set.
LUBM Star Q1 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
?X ub:doctoralDegreeFrom ?Y2
?X ub:undergraduateDegreeFrom ?Y4
?Y1 ub:advisor ?X
?X ub:emailAddress ?Y3
}
LUBM Star Q2 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1
WHERE
{
?X ub:doctoralDegreeFrom <http://www.University104.edu>
?X ub:headOf ?Y1
}
LUBM Star Q3 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1
WHERE
{
?X ub:doctoralDegreeFrom <http://www.University104.edu>
?X ub:teacherOf ?Y1
}
41
46. Rya: Accumulo Indexing Strategies for Searching Semantic Networks
LUBM Star Q13 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?Y1 ?Y2
WHERE"
{
?Y1 ub:takesCourse <http://www.Department0.University0.edu/Course0>
?Y2 ub:teacherOf <http://www.Department0.University0.edu/Course0>
}
45
Entity Centric LUBM Star Queries (5 of 5)
Notas do Editor
During query planning, statement patterns in a query are grouped according to common variables and common constants
Those groups that can match the statement patterns of a cached precomputed join index table can then be replaced by the results stored in that table, trading faster response time for increased storage
Proof that we can stream while constructing new queries:
You can stream new results into the Core Rya Tables and Fluo App while you are registering a new query without missing any Statement Pattern matches so long as you finish writing the query to fluo before you start scanning for historic matches AND you write new statements to Rya before you write them to Fluo.
There are 4 atoms of work related to this process:
Write Statement to Fluo (WF)
Write Statement to Rya (WR)
Commit Query to Fluo (Q)
Scan Rya for historic results (S)
Orderings that will result in one of the two ways a statement can be found missing the statement:
WF … Q = The Triples observer will miss the Statement for the SP.
S … WR = The historic scan will miss the Statement when scanning for the SP.
We can avoid the state where both of these cases are true by following these ordering rules:
Q must come before S
WR must come before WF
These rules leave us with the following operation orderings, all of which do not miss any statements:
Q, S, WR, WF
Q, WR, S, WF
Q, WR, WF, S
WR, WF, Q, S
WR, Q, WF, S
WR, Q , S, WF
WF, Q, S, WR = missed statement
Using the Entity-Centric Index, lists of “bark” and “dog” are stored in Accumulo in a Document Partitioned Index