dipLODocus[RDF] is a new system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocus[RDF] is based on a novel hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a "vertical" analytics perspective (by storing compact lists of literal values for a given attribute).
http://diuf.unifr.ch/main/xi/diplodocus/
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
1. Short and Long-Tail RDF Analytics for
Massive Webs of Data
Marcin Wylot, Jigé Pont, Mariusz Wiśniewski,
and Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg
Switzerland
International Semantic Web Conference
26th October 2011, Bonn, Germany
2. Motivation
● increasingly large semantic/LoD data sets
● increasingly complex queries
○ real time analytic queries
■ like “returning professor who supervises the most students”
urgent need for more efficient and scalable
solution for RDF data management
20. Basic operations - queries
aggregates and analytics
?x type Student.
?x age ?y
filter (?y < 21)
21. Performance Evaluation
We used the Lehigh University Benchmark.
We generated two datasets, for 10 and 100 Universities.
● 1 272 814 distinct triples and 315 003 distinct strings
● 13 876 209 distinct triples and 3 301 868 distinct strings
We compared the runtime execution for 14 LUBM queries
and 3 analytic queries inspired from BowlognaBench.
● returning professor who supervises the most students
● returning big molecule containing everything around
Student0 within scope 2
● returning names for all graduate students
27. Conclusions
● advanced data collocation
○ molecules, RDF sub-graphs
○ lists of literals, compact sorted list of values
○ hash table indexed by templates
● slower inserts and updates
○ compact ordered structures
○ data redundancy
● 30 times faster on LUBM queries
● 350 times faster on analytic queries
30. Transitivity
● Inheritance Manager
○ typeX subClassOf
● Query
○ ?z type typeY
■ ?z type typeY
■ ?z type typeX
● subClassOf
● subPropertyOf
typeY
31. Serialising Molecules
#TEMPLATES * TEMPLATE_SIZE + #TRIPLES * KEY_SIZE
#TEMPLATES - the number of templates in the molecule
TEMPLATE_SIZE - the size of a key in bytes
#TRIPLES - the number of triples in the molecule
KEY_SIZE - the size of a key in bytes, for example 8 in our case (Intel 64, Linux)