SlideShare uma empresa Scribd logo
1 de 9
Benchmarking Graph
Databases
- Apurva Kulkarni
- Saurabh Saxena
What is a Graph Database System
Unlike RDBMS, it uses graphs to represent data.
Data is stored using a collection of nodes and edges.
Graph Database use graph theory to solve graph problems
Examples :Horton, Neo4j,Graphbase,Titan, OrientDB , VertexDB, etc .
Graph databases are very fast in execution of complex pattern matching
queries.
World's first Graph Database System .
Uses cypher ,a declarative graph querying language .
It’s highly scalable ( can store 32 billion edges,nodes and relationships)
Runs on Java Virtual Machine
Architecture :
Multi-Mode Open Source NoSQL DBMS
It uses SQL with some extensions as a query language
Document databases store information in documents like JSON or XML
Uses an HTTP REST API to access/ edit the database
Runs on Java Virtual Machine
BENCHMARKING
Metrics
Clustering Workload
Convergence time for modularity optimization using Louvain Method
Massive Insertion
Time for the creation of the whole graph when populated with massive data
Single Insertion Workload
Time taken for to upload a block, which consist of one thousand edges and
vertices
Query Workload
Time taken to find neighbours of all the nodes
Dataset
Movies dataset
Amazon Dataset
Youtube Dataset
LiveJournal dataset
Modularity in a graph Using Louvain Method
● Modularity
○ Measure of a structure of graph
○ Fraction of the #edges inside of a cluster to the #edges outside
○ Ranges between [-½,1]
● Louvain Method
○ Greedy optimization method
○ Performed in two steps
MODULARITY
Conclusion
Neo4j does not support multi master
replication and Native HTTP REST/JSON
Both support ACID Transaction
OrientDB supports server-side functions
OrientDB provides a better graph editing tool

Mais conteúdo relacionado

Mais procurados

Introduction to Ocean Observation1
Introduction to Ocean Observation1Introduction to Ocean Observation1
Introduction to Ocean Observation1
Jose Rodriguez
 

Mais procurados (20)

Essentials of R
Essentials of REssentials of R
Essentials of R
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Claremont Report on Database Research: Research Directions (Eric A. Brewer)
Claremont Report on Database Research: Research Directions (Eric A. Brewer)Claremont Report on Database Research: Research Directions (Eric A. Brewer)
Claremont Report on Database Research: Research Directions (Eric A. Brewer)
 
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited DimensionHDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triplesOWL reasoning with WebPIE: calculating the closer of 100 billion triples
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
 
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
A seminar on neo4 j
A seminar on neo4 jA seminar on neo4 j
A seminar on neo4 j
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
 
Introduction to Ocean Observation1
Introduction to Ocean Observation1Introduction to Ocean Observation1
Introduction to Ocean Observation1
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
design_doc
design_docdesign_doc
design_doc
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 

Semelhante a CCI DAY PRESENTATION

Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
Anant Kumar
 

Semelhante a CCI DAY PRESENTATION (20)

عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Hadoop
HadoopHadoop
Hadoop
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
HADOOP
HADOOPHADOOP
HADOOP
 
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightAnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
C1803041317
C1803041317C1803041317
C1803041317
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
 

CCI DAY PRESENTATION

  • 1. Benchmarking Graph Databases - Apurva Kulkarni - Saurabh Saxena
  • 2. What is a Graph Database System Unlike RDBMS, it uses graphs to represent data. Data is stored using a collection of nodes and edges. Graph Database use graph theory to solve graph problems Examples :Horton, Neo4j,Graphbase,Titan, OrientDB , VertexDB, etc . Graph databases are very fast in execution of complex pattern matching queries.
  • 3. World's first Graph Database System . Uses cypher ,a declarative graph querying language . It’s highly scalable ( can store 32 billion edges,nodes and relationships) Runs on Java Virtual Machine Architecture :
  • 4.
  • 5. Multi-Mode Open Source NoSQL DBMS It uses SQL with some extensions as a query language Document databases store information in documents like JSON or XML Uses an HTTP REST API to access/ edit the database Runs on Java Virtual Machine
  • 6. BENCHMARKING Metrics Clustering Workload Convergence time for modularity optimization using Louvain Method Massive Insertion Time for the creation of the whole graph when populated with massive data Single Insertion Workload Time taken for to upload a block, which consist of one thousand edges and vertices Query Workload Time taken to find neighbours of all the nodes Dataset Movies dataset Amazon Dataset Youtube Dataset LiveJournal dataset
  • 7. Modularity in a graph Using Louvain Method ● Modularity ○ Measure of a structure of graph ○ Fraction of the #edges inside of a cluster to the #edges outside ○ Ranges between [-½,1] ● Louvain Method ○ Greedy optimization method ○ Performed in two steps
  • 9. Conclusion Neo4j does not support multi master replication and Native HTTP REST/JSON Both support ACID Transaction OrientDB supports server-side functions OrientDB provides a better graph editing tool

Notas do Editor

  1. Combines the power of graph and document into one scalable high performance Database