1. Performance of Neo4J versus MongoDB for Social
actions
May 7, 2014
Santosh S Ravi1
Kalyanaraman Santhanam1
University of Southern California University of Southern California
sathyavi@usc.edu ksanthan@usc.edu
Abstract
The data collected nowadays are highly connected in nature owing to
the social nature of the way in which they are accumulated by the various
social networks and other internet companies. Social network analysis
(SNA) is the analysis of such data and views social relationships in terms
of network theory, consisting of users and relationships between them.
Graph databases like Neo4J have risen to handle these requirements by al-
lowing efficient index free lookups. We try to understand the performance
of the Neo4j over other NoSQL especially MongoDB. We have used three
social metrics namely distance, network closure and assortavitity for our
analysis.
1 Introduction
Relational databases are popular for storing large amount of structured data for
past few decades because of their ACID capabilities. Recent evolution of large
volume of data from Social Networks and cloud services led to the development
of non-traditional NoSQL datastores such as MongoDB, Neo4j and HBase etc,.
With our requirements to model the highly interconnected social networking
data, graph databases are particularly interesting as it directly fits into the
model.
Modeling the social networking data in relational databases requires many-
to-many relations and should perform many join operations for a simple path
traversal between two actors. Graph databases are designed to store the data
in such a manner to perform traversal easily. Popular benchmarks like Yahoo!
Cloud Serving Benchmark (YCSB) benchmarking framework aid in evaluating
the performance of emerging cloud serving systems with different workloads.
However, it does not suit evaluating the performance of popular social network-
ing actions such as View Profile, List Friends in the cloud serving systems. To
overcome the limitation, BG is a benchmark works well to evaluate the perfor-
mance of data stores for interactive social networking actions and sessions. BG
computes either a Social Action Rating (SoAR) or a Socialites rating of a data
store. These ratings compute the number of concurrent actions performed by a
system for a fixed percentage of requests.
We leveraged BG to assess the performance of social metrics such as Dis-
tance, Network Closure and Assortativity in both Neo4j and MongoDB data-
stores. Neo4j, a popular java-based graph database which offers high perfor-
mance, availability and ACID transactions. Neo4j supports query language,
1
2. Cypher to access the data from database. We compared the performance of the
social metrics for both Neo4j Embedded and Neo4j Cypher REST as well as
MongoDB datastore.
2 Description
2.1 Data stores
• Neo4J Community 2.0.0
Run mode: RESTful and embedded
Query mode: Java API and Cypher 2.0
• MongoDB 2.6
2.2 Test setup
All benchmarks are performed on a single machine with specifications as follows:
2.6 GHz Intel Core i5 with 8GB 1600 MHz DDR3 RAM, 256GB SSD, OS X
10.9.2.
2.3 BGBenchmark
We used BGBenchmark http://bgbenchmark.org v0.1.4776 for analysis of the
social networking actions such as Assortativity, Network Closure and Distance.
We also leveraged viewprofile action in BGBenchmark to test these social ac-
tions.
2.4 Data Model and workload
Figure 1 shows BGBenchmark’s data model. The workload used for benchmark-
ing includes: 10,000 users with 4 friends per user and 10 resources per user. The
friends relationship was created such that the data forms a torus model i.e that
all the users are connected to all other users via Friends-of-Friends relation-
ship. The users are given unique usersid between 0..9999 by the BGWorkload
generator.
2.5 Social Metrics
The following are the social metrics identified for the scope of this project.
Network Closure: A measure of the completeness of relational triads.
An individual’s assumption of network closure (i.e. that their friends are also
friends) is called transitivity. Transitivity is an outcome of the individual or
situational trait of Need for Cognitive Closure.
Assortativity: The extent to which actors form ties with similar versus
dissimilar others. Similarity can be defined by gender, race, age, occupation,
educational achievement, status, values or any other salient characteristic.
2
3. Figure 1: Data model used for Benchmarking
Distance: The minimum number of ties required to connect two particular
actors, as popularized by the idea of ‘six degrees of separation’.
2.6 Implementation
The code developed can be classified into three sections namely - Embedded
Neo4J, RESTful Neo4J and finally MongoDB. The goal was to find the best
implementation for the Social Metrics identified here 2.5. To remain fair in our
comparsion, we used the same algorithm in all these sections. The Distance
metric is computed using Breadth-First-Search(BFS) algorithm, Assortativity
metric involves iterating through the properties/attributes and finding the in-
tersection and finally network closure retrieves all the nodes/actors in one hop
for a given node/actor in a single query.
2.7 Test Suite
The implementations are tested for accuracy using Junit test suite. Network
closure and Assortativity are straight forward to test compared to Distance ac-
tion. Since we already know the graph topology forms torus model, Network
closure results can be validated by adding/subtracting the userid with outgo-
ing and incoming friends count. In order to test the Assortativity results, we
inserted ’country name’ and ’organization’ properties with synthetic data for
every 100 users. We also used a formula to test distance metric since it proved
to be less tedious than performing the actual BFS test.
distance = min(
| d − s |
f
,
| N − d + s |
f
),
where s, d are the source and destination usersid and f is the number of
outgoing friend relationships
3
4. Figure 2: Throughput(actions/secs) comparison
3 Findings
3.1 Observation
On comparison of performance between the social metrics, we find network clo-
sure and assortativity has much higher throughput compared to distance metric.
Since the distance metric performs graph transversals whereas the former met-
rics just performs lookups on userid, an indexed attribute in Neo4j. Between
network closure and assortativity, network closure performs poor as expected
since it involves iterating all neighbours of the given nodes to find the intersec-
tion of friend members between the user nodes.
As expected, Embedded Java API outperforms others significantly. The rea-
son being elimination of network overhead and object marshalling/unmarshalling
overhead. We also believe the main reasons for Neo4j Cypher performing poorly
is due to network overhead, cypher query parsing and optimization performed
by cypher engine.
The Neo4j’s index-free lookup takes center stage for the distance metric.
MongoDB performs slower at least by factor of 10 and 100 compared to the Neo4j
RESTful and Embedded versions respectively. The workload for distance metric
being examined requires on an average of 2000 - 3000 index lookups to perform
the BFS traversal to arrive at the goal node. Since Neo4j uses Relationship
Expander for path traversals, it avoids the lookup of user indexes compared
to MongoDB. As for other metrics, MongoDB performs more or less similar to
Neo4j REST. However, we believe mongodb protocol plays a significant part
in MongoDB’s higher throughput results for assortativity. mongodb protocol
operates via TCP/IP over the transport layer using BSON format whereas Neo4j
REST in addition to TCP/IP uses RESTful HTTP protocol headers with JSON
encoding/decoding.
4
5. Figure 3: Throughput(actions/secs) comparison for distance metric
4 Future Work
We would like to include these social metrics as part of BGBenchmark frame-
work to extend the current set of actions. We believe that the implementation
of new social metrics would give better understanding of data stores for complex
graph operations compared to the existing simple operations.
References
[1] Neo4j Documentation, http://docs.neo4j.org/
[2] MongoDB Documentation, http://docs.mongodb.org/manual/
[3] Florian Holzschuher and Ren´e Peinl, Performance of graph query languages:
comparison of cypher, gremlin and native access in Neo4j.
[4] Social Network Analysis, http://en.wikipedia.org/wiki/Socialnetworkanalysis
[5] BG Benchmark, http://bgbenchmark.org/BG/overview.html
5