The document describes DEX, a high-performance graph database management system. DEX uses an internal representation based on bitmaps to store and query very large graphs. It was developed by researchers at the DAMA-UPC research group. Experimental results show DEX can load graphs with billions of edges in hours and perform common graph queries on large real-world datasets within seconds to minutes. Future work aims to further optimize DEX for querying trillions of objects and develop distributed and transactional capabilities.
1. DEX: a High-Performance Graph Database Management System
A High-Performance Graph
Database Management
System
Authors:
Norbert Martínez-Bazan
Sergio Gómez-Villamor
Francesc Escalé-Claveras
2. DEX: a High-Performance Graph Database Management System
Outline
Nom e la presenatació o altra info (opcional)
Introduction
DEX
Logical graph model
Internal representation
Software architecture
Experimental results
Conclusions
Future work
3. DEX: a High-Performance Graph Database Management System
Introduction
Nom e la presenatació o altra info (opcional)
[2006] DEX started by DAMA-UPC
[2010] Sparsity Technologies is a spin-
out from DAMA-UPC
Sparsity comercializes and provides
services
DAMA-UPC does development and
research
DEX Versions
V2.0 March/2009
V3.0 October/2009
V4.0 November/2010
4. DEX: a High-Performance Graph Database Management System
DEX
Nom e la presenatació o altra info (opcional)
DEX is a graph database:
Data and schema both are represented as a graph
Data operations are based on graph operations
Graph-based integrity restrictions
Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database
models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)
Focus:
Management of very large graphs
High-performance on query operations
5. DEX: a High-Performance Graph Database Management System
Logical graph model
Nom e la presenatació o altra info (opcional)
Labeled: nodes and edges are “typed”
Directed: edges can have a fixed direction
Attributed: nodes and edges can have multiple single-valued attributes
Multigraph: two nodes can be connected by multiple edges
6. DEX: a High-Performance Graph Database Management System
Internal representation
Nom e la presenatació o altra info (opcional)
Requirements
Split the graph into smaller structures
• Favour the caching
• Move to main memory just significant parts
OIDs instead of objects
• Reduce memory requirements
Specific structures to improve traversals
• Index edges of a node
Attributes fully indexed
• Improve queries based on value filters
7. DEX: a High-Performance Graph Database Management System
Internal representation
Nom e la presenatació o altra info (opcional)
Our approach:
Map + Bitmaps Link
Link: bidirectional association between values and OIDs
Two functionalities:
• Given a value a set of OIDs (a bitmap)
• Given an OID the value
Bitmaps
oid oids
1 2 3 4 5
a 1 value
1 1 0 0 0 1
b 2 2 a 1 2 3
c 3 3 b 0 1 1
1 2 3 4
4 c
4 0 0 0 1
5
5
Map
Link
8. DEX: a High-Performance Graph Database Management System
Internal representation
Nom e la presenatació o altra info (opcional)
A Graph as a combination of Bitmaps:
1 Bitmap for each node or edge type
1 Link for each attribute
2 Links for each edge type:
Out-going and in-going edges
N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A.
Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration
on large graphs for information retrieval. In Proceedings of the sixteenth
ACM conference on Conference on information and knowledge
management (CIKM '07)
9. DEX: a High-Performance Graph Database Management System
Software architecture
Nom e la presenatació o altra info (opcional)
DEXCORE
Complete C++ library
• Storage and query
Linux / Windows /
MacOSX
32 / 64 bits
JDEX
DEXCORE functionality
provided as a
Java library
10. DEX: a High-Performance Graph Database Management System
Software architecture
Nom e la presenatació o altra info (opcional)
DEXCORE:
IO
Segment: Logical space of pages
Pool: Groups of segments
Storage: I/O device
Cache: I/O management
• Replacement policy
Data:
Paged out-of-core structures
Bitmaps, Maps, Links, …
Graph:
A combination of structures
DbGraph and RGraphs
DEX:
Database and Session management
11. DEX: a High-Performance Graph Database Management System
Software architecture
Nom e la presenatació o altra info (opcional)
Implementation details:
37-bit unsigned integer OIDs
+ 137 billion objects per graph
Bitmaps are compressed
Clusters of 32 consecutive bits
Just existing clusters are stored
Groups of OIDs for each type
Higher density of consecutive bits into bitmaps
Maps are B+ trees
A compressed UTF-8 storage for UNICODE strings
12. DEX: a High-Performance Graph Database Management System
Experimental results
Nom e la presenatació o altra info (opcional)
Load tests
IMDB Wikipedia RMAT (sf=28)
DB 2.4 GB 7.6 GB 83 GB
Physical Mem 9 GB 9 GB 60 GB
Load time 21 min 2h 6min 15h
Nodes 13 M 19 M 230 M
Edges 22 M 180 M 2147 M
Values 48 M 283 M 230 M
Insertions per sec 65 K 62 K 48 K
13. DEX: a High-Performance Graph Database Management System
Experimental results
Nom e la presenatació o altra info (opcional)
Query tests
IMDB database (2.4 GB)
Queries:
• A: full extraction of a movie, multiple 1-hop traversal [4K edges]
• B: distance between two actors [8K edges]
• C: extract all movies that match a given pattern [315K edges]
In-memory 128 MB
A 0.13 sec 0.13 sec
B 1.52 sec 1.79 sec
C 384 sec 385 sec
14. DEX: a High-Performance Graph Database Management System
Conclusions
Nom e la presenatació o altra info (opcional)
We propose DEX, a high performance graph database
querying system for labeled and directed attributed
multigraphs
We propose a graph representation based on the
intensive use of bitmaps
We perform an experimental performance analysis to
show the ability of DEX to store and query very large
graphs
15. DEX: a High-Performance Graph Database Management System
Future work
Nom e la presenatació o altra info (opcional)
Trillions of objects
Transactional system
Distributed system
Query language
Query optimization
High-level graph operations
Pattern matching
16. DEX: a High-Performance Graph Database Management System
Questions?
Nom e la presenatació o altra info (opcional)
sgomez@sparsity-technologies.com
Sparsity Technologies http://www.sparsity-technologies.com
DAMA-UPC http://www.dama.upc.edu