Implementation details of Sparksee's graph database, learn how bitmaps store graph information and how this result in a lightweight & high-performance solution.
4. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Graphs are everywhere!
!
— Increasing number of huge networks such as the Web,
Social Networks, Biological Systems, GPS…!
!
— Very large graphs!
!
— Interest for analyzing the !
interrelation between the entities !
in theses networks!
!
5. Classical graph representation!
!
— Adjacency matrix!
! Very large NxN sparse matrix, no labels, no multigraph,
! no attributes!
— Adjacency list!
! No labels, no attributes, still sparse consuming!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
6. Classical graph storage!
— Relational database!
! Prefixed schema or very large table for nodes and edges, not !
! suitable for path traversals and graph exploration!
— XML!
! XML data is stored in the form of trees!
! Much work done on finding exact or approximate patterns !
! (subtrees)!
! Not thought for complex graph queries!
— RDF!
! Widely adopted standard for manipulating graph-like data!
! Large support from large vendors!
! SPARQL has become a de facto standard
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
7. New approaches to graph analysis!
!
— Complex analysis computations on very large distributed
graphs !
! Map-reduce (Pegasus)!
! Vertex-centric computation model (Pregel)
!
— Graph Databases: database functionalities to store and
query graph-like data !
! Graph storage in a file system of a computer node with buffer !
! pool (Neo4j, Hypergraph, OrientDB, Infinitegraph!
! Multiple servers accessible through a load balancer (Neo4j HA)
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
8. Requirements for graph databases!
!
— Data and schema represented as a graph!
— Data operations based on graph operations!
— Graph-based integrity restrictions!
— Multigraphs!
— Attributes attached to both vertices and edges!
— Graph queries combining edge traversals with attribute !
accesses!
— Diversity of workloads!
— Efficient secondary memory management!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
9. º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Introduction to Sparksee!
10. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
Sparksee!
!
IS a high-performance and out-of-core !
graph database management system
!
FOR large scale labeled and attributed multigraphs!
!
BASED ON vertical partitioning and collections of objects
identifiers stored as bitmaps
11. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
Sparksee — Characteristics!
!
— Graph split into small structures
! Move to main memory just significant parts (caching)
— Object identifiers (oids) instead of complex objects
! Reduce memory requirements
— Specific structures to improve traversals
! Index the edges and the neighbors of each node
— Attribute indices
! Improve queries based on value filters
— Implemented in C++
! Different APIs (Java, .NET, etc.) through wrappers
12. !
!
Sparksee — Capabilities!
!
Efficiency
! very compact representation using bitmaps. Highly compressible data !
! structures.
Capacity
! more than 100 billion vertices and edges in a single multicore computer.
Performance
! subsecond response in recommendation queries.
Scalability
! high throughput for concurrent queries.
Consistency
! partial transactional support with recovery.
Multiplatform
! Linux, Windows, MacOSX, Mobile
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
13. !
Logical graph model!
!
Labeled
! a label (type) for each vertex and edge !
Directed
! edges can have a fixed direction, from tail to head !
Attributed
! variable list of attributes for each!
! vertex and edge !
Multigraph
! multiple edges between two !
! vertices !
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
16. !
!
!
Graph representation!
!
We define a graph G = (V,E,L,T,H,A1,…,Ap) as: !
LABELS L = {(o, l ) | o ∈ (V ∪ E ) ∧ l ∈ string}
TAILS T = {(e, t ) | e ∈ E ∧ t ∈ V }
HEADS H = {(e, h) | e ∈ E ∧ h ∈ V }
ATTRIBUTES Ai = {(o, c ) | o ∈ (V ∪ E ) ∧ c ∈ {int, string, ...}}
!
With this representation:
— the graph is split into multiple lists of pairs!
— the first element of each pair is always a vertex or an edge!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
17. Graph representation!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE),
(v2,
ARTICLE),T (e1, v1), (e2,
v2), (e3, v4),
(e , v ), (e ,H (e1, v3), (e2,
v3), (e3, v3),
(e , v ), (e ,Aid (v1, 1), (v2, 2),
(v3, 3), (v4, 4),
(v , 1), (v , 2)Atitle (v1, Europa),
(v2, Europe),
(v , Europe),Anlc (v1, ca), (v2,
fr), (v3, en),
(v , en), (e ,Afilename (v5,
europe.png),
(v , bcn.jpg)Atag (e4, continent)
18. !
!
Value sets!
!
Groups all pairs of the !
original set with the !
same value as a pair !
between the value and !
the set of objects with !
such value. !
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE), (v2, ARTICLE),
(v3, ARTICLE),
(v4, ARTICLE), (v5, IMAGE),
(v6, IMAGE), (e1, BABEL), (e2,
BABEL), (e3, REF), (e4, REF),
(e5, CONTAINS),
(e6, CONTAINS), (e7,
CONTAINS)
(ARTICLE, {v1, v2, v3, v4}),
(BABEL, {e1, e2}),
(CONTAINS, {e5, e6, e7}),
(IMAGE, {v5, v6}), (REF, {e3,
e4})
T (e1, v1), (e2, v2), (e3, v4), (e4,
v4), (e5, v3), (e6, v3), (e7, v4)
(v1, {e1}), (v2, {e2}), (v3, {e5,
e6}), (v4, {e3, e4, e7})
H (e1, v3), (e2, v3), (e3, v3), (e4,
v3), (e5, v5), (e6, v6), (e7, v6)
(v3, {e1, e2, e3, e4}), (v5, {e5}),
(v6, {e6, e7})
Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4),
(v5, 1), (v6, 2)
(1, {v1, v5}), (2, {v2, v6}), (3,
{v3}), (4, {v4})
Atitle (v1, Europa), (v2, Europe), (v3,
Europe), (v4, Barcelona)
(Barcelona, {v4}), (Europa,
{v1}), (Europe, {v2, v3})
Anlc (v1, ca), (v2, fr), (v3, en), (v4,
en), (e1, en),(e2, en)
(ca, {v1}), (en, {v3, v4, e1, e2}),
(fr, {v2})
Afilena
me
(v5, europe.png), (v6, bcn.jpg) (bcn.jpg, {v6}), (europe.png,
{v5})
Atag (e4, continent) (continent, {e4})
19. !
Bitmap representation!
!
— Each vertex and edge is identified by a unique and
immutable !
oid (object identifier)
!
— Each vertex or edge set is stored in a bitmap structure:
! Each position in a bitmap corresponds to the oid of an object!
! Reduced amount of space (compression techniques)
! Very efficient binary logic operations
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
20. !
Value set representation!
!
— A value set is represented as two maps!
! One maps each different value to a vertex or edge set!
! The other maps each vertex or edge to a value oid
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
21. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Example of a bitmap based representation!
!
!
!
!
!
!
!
!
23. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
!
Value set operations!
!
domain returns the set of distinct values
objects returns the set of vertices or edges associated to a
value!
lookup returns the set of values !
associated to a set of objects!
insert adds a vertex or edge to the !
collection of objects of a value!
remove removes a vertex or edge !
from the collection of objects of a value
24. Graph query examples
— Number of articles!
! |objects (LABELS, ‘ARTICLE’)|
— Out-degree of English article ‘Europe’!
! |objects (TAILS, objects( TITLE, ‘Europe’) ∩ objects (NLC, ‘en’) ∩ objects
! (LABELS, ‘ARTICLE’))|
— Articles with references to the image with filename ‘bcn.jpg’
! ! {lookup(TAILS, x ) |x ∈ objects (HEAD, objects (FILENAME, ′ bcn.jpg′ ) !
! ! ∩ objects (LABELS, ′ IMAGE′ ))} !
— Count the articles of each language
{(x , y ) | x ∈ domain(NLC) ∧ y = |(objects (NLC, x ) ∩ objects (LABELS, !
! ! ′ ARTICLE′ ))|}
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
25. !
Implementation details
— Bitmaps are compressed by grouping the bits into clusters
of 32 consecutive bits (up to 137 billion objects per graph)!
— Locality is improved by generating consecutive oids for
each distinct vertex or edge labels!
— Sorted tree structure of bitmap clusters to speedup the
insert, remove, and binary logic operations!
— Maps are implemented using B+ trees
— The tail, head and attribute value sets have been split into
specific value sets for each label
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
27. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Queries!
!
!
!
!
!
!
!
!
!
Q1: Find the article with the largest outdegree and traverse its shortest path tree
Q2: Recommend articles related to the most popular one
Q3: Find new images for articles from translations in other languages
Q4: Find, for each different language, the number of articles and images referenced
Q5: For each article with images, materialize the count of images
Q6: Remove all articles without images
Q1 Q2 Q3 Q4 Q5 Q6
k-hops and path traversals + +
graph pattern matching +
aggregations and edge connectivity +
graph transformation + +
28. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Performance Out-of-core!
!
Wikipedia Benchmark out-of-core, 1GB buffer pool.
!
!
!
!
!
!
!
!
(⋆) Java VM with 45 GB
MonetDb MySQL Neo4J* SPARKSEE
Graph Size (GB) 12.00 15.72 42.00 16.98
Load (h) Error 1.36 8.99 2.89
Q1 (s) 4,801.6 > 12 h. > 12 h. 120.5
Q2 (s) 3,788.4 13,841.6 > 12 h. 205.4
Q3 (s) 458.9 33.0 481.0 10.8
Q4 (s) 279,3 45.0 > 12 h. 144.9
Q5 (s) 267.4 930.3 > 12 h. 140.9
Q6 (s) Error 10707.0 > 12 h. 25791.6
44. !
!
Bibliography!
!
R. Angles, A. Prat, D. Dominguez, J.L. Larriba, Benchmarking database systems
for social network applications (GRADES 2013)
!
N. Martínez, V. Muntés, S. Gómez, M.A. Águila, D. Dominguez, J.L. Larriba,
Efficient Graph Management Based On Bitmap Indices (IDEAS 2012)
!
N. Martínez, S. Gómez, F. Escalé, DEX: a High-Performance Graph Database
Management System (GDM 2011)
!
D. Dominguez, P. Urbón, A. Giménez, S. Gómez, N. Martínez, and J. L. Larriba,
Survey of Graph Database Performance on the HPC Scalable Graph Analysis
Benchmark (IWDG 2010)
!
N. Martínez, V. Muntés, S. Gómez, J. Nin, M. A. Sánchez, and J. Larriba, Dex:
High-performance Exploration on Large Graphs for Information Retrieval (CIKM
2007)
!
! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology