SlideShare a Scribd company logo
1 of 6
SPARJA: a Distributed Social Graph Partitioning and
Replication Middleware
Stylianou Maria
KTH Royal Institute of Technology
Stockholm, Sweden
mariasty@kth.se
Girdzijauskas Šar¯unas
KTH Royal Institute of Technology
Stockholm, Sweden
sarunas@sics.se
ABSTRACT
The rapid growth of Online Social Networks (OSNs) has
lead to the necessity of effective and low-cost scalability.
Approaches like vertical and horizontal scaling have been
proven to be inefficient due to the strong community struc-
ture of OSNs. We propose SPARJA, a distributed graph
partitioning and replication middleware for scaling OSNs.
SPARJA is an extension of SPAR [8] with an improved par-
titioning algorithm which functions in a distributed man-
ner and eliminates the requirement of a global view. We
compare and evaluate SPARJA with a variance of SPAR
on synthesized datasets and datasets from Facebook. Our
results show that our proposed system is on par with and
even outperforms SPAR depending on graphs natures and
clusterization.
Categories and Subject Descriptors
C.4 [Performance of Systems]: Miscellaneous
; D.4.8 [Performance]: Metrics—performance measures
General Terms
Online Social Networks, Scalability, Partitioning, Replica-
tion
Keywords
Online Social Networks, scalability, partitioning, replication,
SPAR, JA-BE-JA
1. INTRODUCTION
Recently, there has been an abrupt transition of interest
from traditional web applications to social applications and
especially to Online Social Networks (OSNs), e.g. Facebook1
and Twitter2
. Both Facebook and Twitter have millions of
active users who post, comment and update their status at
1
http://www.facebook.com
2
http://www.twitter.com
very high rates. This trend makes OSNs a popular object of
analysis and research.
Recent research showed that OSNs produce a non-traditional
form of workloads [1, 12], mostly because of the different
nature of data. Data are highly personalized and intercon-
nected due to the strong community structure [5–7]. These
new characteristics impose new challenges in terms of OSNs
maintenance and scalability.
To address scalability, two approaches have been followed
so far; the vertical and the horizontal scaling. The former
solution, which implies replacements of existing hardware
with high-performance servers, tends to be very expensive
and sometimes not infeasible because of the very large size of
OSNs. The latter solution proposes load partitioning among
several cheap commodity servers or virtual machines (VMs),
the second of which derives from the emergence of cloud
computing systems, e.g. Amazon EC23
and Google Ap-
pEngine4
. With this approach, data are partitioned into
disjoint components, offering horizontal scaling in low cost.
However, problems arise when it comes to OSNs.
In OSNs, users can be members of several social communi-
ties [5–7], making the partitioning unfeasible. Most of the
operations in OSNs concern a user and her friends, who are
her neighbors. Thus, if a user belongs to many commu-
nities, clean partitioning is apparently impossible. Subse-
quently, queries are resolved with high inter-server traffic.
An attempt to eliminate this traffic is to replicate all users
data in multiple or all the servers. However, this leads to
an increased replication overhead which hinders consistency
among replicas.
SPAR [8], a social partitioning and replication middleware,
addresses the problem of partitioning - and consequently
scaling - OSNs. However, a global view of the entire network
is required and therefore, its use for extremely large-scale
OSNs may be defective. In this paper, we present a variant
of the SPAR algorithm which uses the partitioning technique
proposed in [10]. The new system has an improved - and
distributed - partitioning phase which does not require the
global view. We evaluate and compare our heuristic with
the initial SPAR algorithm, proving that scalability can be
3
http://aws.amazon.com/ec2/
4
http://cloud.google.com/appengine
improved with a distributed approach for partitioning.
In the next section, we discuss about related work and the
background of our research. In Section 3, we present our
contribution and describe the system deployed. Section 4
consists of the evaluation with the experiments conducted
and in section 5, our conclusions are listed.
2. BACKGROUND AND RELATED WORK
Due to the recent emergence of OSNs, scaling and main-
taining such networks constitute a new area of research with
limited work so far. In this section, we describe approaches
followed in the past and associate them with SPAR and our
work.
Scaling out web applications is achievable with the use of
Cloud providers, like Amazon EC2 and Google AppEngine.
Developers have the ability to dynamically add or remove
computing resources depending on the workload of their ap-
plications. This facility requires the applications to be state-
less and the data to be independent and easily sharded into
clean partitions. OSNs deal with highly interconnected and
dependent data and therefore scaling out is not a scaling
solution.
Nowadays, Key-Value stores have become the scaling solu-
tion for several popular OSNs. Key-Value stores are de-
signed to scale with the tradeoff to partition data randomly
across the servers. This requirement limits the performance
of OSNs. Puyol et al. [8] have proven that SPAR performs
better than Key-Value stores. Because of their principle
of preserving data locality, they managed to minimize the
inter-server traffic and, therefore, improve the performance.
Another approach for scaling and maintaining applications
is the use of Distributed File Systems. Such systems [4,11]
distribute and replicate data for achieving high availability.
In the case of OSNs, most of the queries concern data from
several users which would imply fetching data from multi-
ple servers. SPAR does not follow the same approach as
Distributed File Systems, but it replicates data in such a
manner that all necessary data can be found locally and
more efficiently.
SPAR is the initial work and motivation for our research.
It is a partitioning and replication algorithm, designed for
social applications. SPAR offers transparent scalability [9]
by preserving local semantics, i.e. storing all relevant data
for a user on one server. Moreover, it aims at minimizing
the replication overhead for keeping the overall performance
and system efficiency high. SPAR achieves load balancing
by acquiring the global view of the network. However, hav-
ing access to all data at all times can be very costly and
impractical, especially for large-scale systems. Additionally,
SPAR has a central partition manager which imposes single
point of failure. Both drawbacks are addressed in our imple-
mentation which is described in next section. Furthermore,
we tackle the possibility of SPAR’s partition manager to fall
into a local optima while trying to preserve load balancing.
This likelihood may result to an increased replication over-
head which we also try to improve in the proposed system.
3. OUR CONTRIBUTION - SPARJA
Our main contribution is the implementation of SPARJA,
a variant of SPAR which is based on JA-BE-JA [10]. JA-
BE-JA is a distributed graph partitioning algorithm that
does not require the global view of the system. SPARJA
eliminates the single point of failure from the initial SPAR
by replacing its main algorithm with JA-BE-JA. It also aims
at replication overhead minimization with the execution of
a simple straightforward technique.
3.1 System Architecture
Figure 1 depicts a three-tier web architecture with SPARJA
and it is based on the architecture of SPAR. The application
can interact with SPARJA through the Middleware (MW).
When the application requests a read or write operation on
a user’s data, it calls the MW which locates the back-end
server that contains the data of the specific user. The MW
sends back to the application the address of the server to
initiate a data-store interface, like MySQL, Cassandra or
others.
Figure 1: SPARJA Architecture
3.2 Description
SPARJA is a dynamic gossip-based local search algorithm.
Its goal is to group and store connected users, i.e. friends,
into the same server. In that way, SPARJA aims to reduce
the replication overhead as well as the inter-server traffic.
Initially, the system takes as an input a partial graph and
partitions it into k equal size components. All components
have the same amount of users, thus achieving load balanc-
ing. Afterwards, each node behaves as an independent pro-
cessing unit which periodically executes the algorithm based
on local information about the graph topology. These peri-
odical executions are essential for repartitioning the graph
and minimizing the replica nodes. Nodes can work in par-
allel, nevertheless SPARJA can work as a central system as
well.
3.3 System Operations
SPARJA is responsible to preserve scalability and trans-
parency of the application by distributing users, partitioning
the network and creating replicas under certain conditions.
Below, we describe operations that SPARJA executes for
achieving all its goals.
3.3.1 Data Distribution and Partitioning
SPARJA guarantees that users are equally distributed among
all servers. When a new user joins the network, a node -
called master node - is created and stored in the server with
the minimum number of master nodes. Hence, the data
distribution is fair and the load balanced. Recalling from
SPAR, users may move from one server to another. In con-
trast, users in SPARJA may exchange positions, i.e. User
A can move to the server of user B, and user B can move
to the server of user A, in order to be co-located with their
friends.
3.3.2 Data Replication
Data Replication is an important function of SPARJA. By
replicating master nodes, two requirements are satisfied; lo-
cal semantics and fault tolerance. When a new user joins
the network, along with the master node, replica nodes are
created and stored in servers. The number of replicas is a
custom value, initially set before the execution of the al-
gorithm and serves for preserving fault tolerance. When a
new friendship is established, new replicas may be created,
if needed, for data locality. SPARJA attempts to keep the
number of replicas to the minimum, by solving the set-cover
problem [2]. Particularly, with the creation of additional
replicas for data locality, some of the fault tolerance repli-
cas may be removed. Listing 1 presents the replication al-
gorithm in pseudocode which is executed periodically from
each node. The first part of the algorithm guarantees lo-
cality by creating replica nodes of a user - if not exist - in
the servers of her friends. The second part guarantees fault
tolerance by creating additional replica nodes for a user in
servers that do not already have the user.
1 f o r user in graph :
2 f r i e n d s = g e t f r i e n d s ( user )
3 f o r f r i e n d in f r i e n d s :
4 i f server i s not same :
5 i f ! r e p l i c a e x i s t s ( user ) :
6 c r e a t e r e p l i c a ( user )
7
8 f o r user in graph :
9 r e p l i c a s = g e t r e p l i c a s ( user )
10 i f num replicas < k r e p l i c a s :
11 new replicas = k r e p l i c a s −
e x i s t i n g r e p l i c a s
12 f o r j in range ( new replicas ) :
13 f o r k in range ( t o t a l s e r v e r s ) :
14 i f k i s not master server :
15 i f ! r e p l i c a e x i s t s ( user , k) :
16 c r e a t e r e p l i c a ( user , k)
Listing 1: Algorithm for Data Replication in
SPARJA
3.3.3 Sampling and Swapping Policies
Each node runs the algorithm of SPARJA as a process-
ing unit. It periodically selects a node and moves to the
swapping policy, which measures the benefit of swapping its
server with the server of the sampled node. The benefit of
swapping is measured in terms of energy, as introduced in
JA-BE-JA [10]. Each node has energy and, therefore, the
system has a global energy. The energy becomes low when
nodes are placed close to their neighbors. SPARJA uses the
same energy function to measure the swapping benefit. If
the energy decreases for both nodes - which is the desired be-
havior - then the swapping is performed, otherwise it halts.
A hybrid node selection policy is followed [10], which con-
sists of two parts. Firstly, the node selects - in random -
one direct neighbor and calculates the benefits. If the en-
ergy function does not improve, then the node performs a
random walk and selects another node from its walk [3].
3.3.4 Simulated Annealing
Local search algorithms tend to stuck in local optimas. Simi-
larly, SPARJA is vulnerable to this hazard which would lead
to higher replication overhead. To address this possibility,
we employ the Simulated Annealing technique as described
in [13]. Initially, noise is introduced to the system which
is analogous to temperature, causing the system to deviate
from the optimum value. After an amount of iterations, the
system starts to stabilize and eventually concludes to the
optimal solution, rather than a local optima.
4. EVALUATION
For evaluating SPARJA, we have implemented both SPARJA
and SPAR algorithms. Both algorithms are implemented in
Python, using Cassandra as the data store.
4.1 Metrics
The principal evaluation metric we use is the replication
overhead, which consists of the number of replicas created
for local semantics and the number of replicas created for
fault tolerance.
4.2 Datasets
We use six datasets for evaluating the replication overhead;
three synthesized datasets and three facebook datasets.
Synthesized Datasets
We generated three synthesized datasets with different clus-
terization levels in order to study the clusterization impact
in the replication overhead. All three datasets contain 1000
nodes each with node degree equal to 10. The Randomized
graph (Synth-R) has no clusterization policy, while the Clus-
tered (Synth-C) and Highly Clustered (Synth-HC) graphs
have 75% and 95% of clusterization respectively.
Facebook Datasets
The three Facebook datasets have been acquired from the
Stanford Large Network Dataset Collection 5
, with number
of edges approximate to 3000, 6000 and 60000 respectively.
All details - including nodes, edges and clusterization levels
- can be found in the Table 1.
5
http://snap.stanford.edu/
Dataset Nodes Edges Clusterization (%)
Synth-R 1000 10,000 0%
Synth-C 1000 10,000 75%
Synth-HC 1000 10,000 95%
Facebook-1 150 3386 n/a
Facebook-2 224 6384 n/a
Facebook-3 786 60,050 n/a
Table 1: Description of Datasets
4.3 Environment Preparation
Before conducting any experiments, we set up the cooling
rate (δ), i.e. the rate of change of temperature, used in
the simulated annealing technique. As stated in [13], the
number of iterations is equal to the fraction of temperature
difference divided by the cooling rate.
number of iterations =
To − 1
δ
To is the Initial Temperature of the network, which declines
according to the cooling rate. Assuming a network with
four servers and a fixed value of the Initial Temperature
(To) equal to 2, we run the algorithm for different numbers
of iterations in order to adjust the cooling rate. The datasets
used are the synthesized graphs with 0%, 75% and 95% of
clusterization. Figure 2 shows how the number of non lo-
cal nodes is adjusted with the increase of iterations. From
the number of non local nodes we can deduce how much
replication overhead is caused and how well the clusteriza-
tion is done. As illustrated, the number of non local nodes
decreases with the increase of iterations, and eventually sta-
bilizes at 200 iterations for the randomized graph and at 300
iterations for both clusterized and highly clusterized graphs.
Figure 2: Number of Iterations vs Number of Non
Local Nodes
In Table 2, we accumulate all the parameters with fixed
values set before realising the experiments.
4.4 Experiments
In our experiments we compare SPARJA and SPAR in terms
of the replication overhead. We designed three scenarios
Parameter Value
Initial Temperature (To) 2
Final Temperature (T) 1
Cooling Factor (δ) 0.003
Energy Function Parameter (n) 2
Table 2: Parameters for SPARJA
for testing the impact of different datasets, fault tolerance
replication and amount of servers.
4.4.1 Replication Overhead on Different Datasets
In the first experiment, we study how different datasets and
topologies affect the replication overhead of the system. Fig-
ures 3 and 4 show the replication overhead of both SPAR
and SPARJA in synthesized graphs and facebook graphs re-
spectively. The amount of servers is set to four (S=4) and
the replication factor for fault tolerance is set to zero (K=0).
As revealed in Figure 3, a higher clusterization leads to lower
replication overhead in SPARJA. As expected, SPARJA takes
advantage of the existing graph clusterization and continues
redistributing nodes based on this divided topology. As a
result, SPARJA outperforms SPAR in the clustered graphs
while giving worse results than SPAR in the random graph.
Similarly, Figure 4 shows how SPARJA and SPAR perform
on facebook graphs. Again, SPARJA gives better results as
compared to SPAR.
Figure 3: Replication Overhead on Synthesized
Datasets
Figure 4: Replication Overhead on Facebook
Datasets
4.4.2 Replication Overhead vs Replication Factor
Next, we turn our attention to the replication factor and
whether it affects the replication overhead. Figure 5 shows
the replication overhead of SPARJA in all datasets with
replication factor for fault tolerance set to zero (K=0) and
two (K=2). The amount of servers is set to four (S=4).
As can be seen, the fault tolerance replication factor can
dramatically affect and decrease the replication overhead.
As it was expected, fault tolerance replica nodes are also
used for preserving data locality.
Figure 5: Replication Overhead for Different Num-
ber Fault Tolerance Replicas
4.4.3 Replication Overhead with Different Number
of Servers
In our final experiment we measure the replication overhead
of both SPARJA and SPAR for different number of servers
S=4,8,16. The fault tolerance replication factor is set to 2
(K=2) and all datasets are used.
In Figure 6 we plot the results collected from the algorithms,
divided in 6 graphs, each one for a different dataset. As
expected in all datasets, the replication overhead increases
with the increase of the number of servers.
5. CONCLUSIONS
Online Social Networks have faced a steep growth over the
last decade. This popularity has lead companies to study the
nature of OSNs and offer scalability and maintenance ser-
vices. However, none of the scalability approaches, proposed
so far, has solved all the scalability issues. The strong com-
munity structure of such systems makes Key-Value stores
and relational databases inefficient.
We proposed SPARJA, a distributed graph partitioning and
replication middleware for scaling OSNs. SPARJA parti-
tions the graph into k balanced components and maintains
them without obtaining the global view of the system. It
relies on data replication for preserving fault tolerance and
locality semantics, while aiming to keep the replication over-
head as low as possible.
The evaluation of SPARJA was accomplished using synthe-
sized graphs as well as real datasets from facebook. Our
comparisons with SPAR showed that SPARJA offers signif-
icant gains in replication overhead, especially when there is
clusterization of the graph. Moreover, with the low replica-
tion overhead, it covers both goals of locality semantics and
fault tolerance.
We implemented and tested an initial version of SPARJA.
We leave the integration of SPARJA in a real system with
a three-tier architecture as a future work.
6. ACKNOWLEDGEMENTS
We would like to thank our colleague Muhammad Anis ud-
din Nasir for his valuable contribution and help in the project.
We also thank Fatemeh Rahimian for providing sources for
datasets used for the evaluation part of the project.
7. REFERENCES
[1] F. Benevenuto, T. Rodrigues, M. Cha, and
V. Almeida. Characterizing user behavior in online
social networks. In Proceedings of the 9th ACM
SIGCOMM conference on Internet measurement
conference, pages 49–62. ACM, 2009.
[2] R. Carr, S. Doddi, G. Konjevod, and M. Marathe. On
the red-blue set cover problem. In Proceedings of the
eleventh annual ACM-SIAM symposium on Discrete
algorithms, pages 345–353. Society for Industrial and
Applied Mathematics, 2000.
[3] M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou.
Walking in facebook: A case study of unbiased
sampling of osns. In INFOCOM, 2010 Proceedings
IEEE, pages 1–9. IEEE, 2010.
[4] R. Guy, J. Heidemann, W. Mak, T. Page Jr,
G. Popek, D. Rothmeier, et al. Implementation of the
ficus replicated file system. In USENIX Conference
Proceedings, volume 74, pages 63–71. Citeseer, 1990.
[5] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney.
Community structure in large networks: Natural
cluster sizes and the absence of large well-defined
clusters. Internet Mathematics, 6(1):29–123, 2009.
[6] M. Newman. Modularity and community structure in
networks. Proceedings of the National Academy of
Sciences, 103(23):8577–8582, 2006.
[7] M. Newman and J. Park. Why social networks are
different from other types of networks. Physical
Review E, 68(3):036122, 2003.
[8] J. Pujol, V. Erramilli, G. Siganos, X. Yang,
N. Laoutaris, P. Chhabra, and P. Rodriguez. The little
engine (s) that could: scaling online social networks.
In ACM SIGCOMM Computer Communication
Review, volume 40, pages 375–386. ACM, 2010.
[9] J. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez.
Scaling online social networks without pains. In Proc
of NETDB. Citeseer, 2009.
[10] F. Rahimian, A. H. Payberah, S. Girdzijauskas,
M. Jelasity, and S. Haridi. JA-BE-JA: a distributed
algorithm for balanced graph partitioning.
forthcoming.
[11] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki,
E. Siegel, and D. Steere. Coda: A highly available file
system for a distributed workstation environment.
Computers, IEEE Transactions on, 39(4):447–459,
1990.
[12] F. Schneider, A. Feldmann, B. Krishnamurthy, and
W. Willinger. Understanding online social network
usage from a network perspective. In Proceedings of
Figure 6: Replication Overhead with Different Number of Servers
the 9th ACM SIGCOMM conference on Internet
measurement conference, pages 35–48. ACM, 2009.
[13] E. Talbi. Metaheuristics: from design to
implementation. 2009.

More Related Content

What's hot

NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
 
Page a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationPage a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationCloudTechnologies
 
Efficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEfficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEditor IJCATR
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATIONNexgen Technology
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Kiruthikak14
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...Shakas Technologies
 
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Kiruthikak14
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Ural-PDC
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?Spyros Eleftheriadis
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACAPankaj Kumar Jain
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...NECST Lab @ Politecnico di Milano
 
Cloud computing1
Cloud computing1Cloud computing1
Cloud computing1ali raza
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseMd. Shamsur Rahim
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 

What's hot (19)

NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
 
Load rebalancing
Load rebalancingLoad rebalancing
Load rebalancing
 
Page a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computationPage a partition aware engine for parallel graph computation
Page a partition aware engine for parallel graph computation
 
Efficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEfficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent Environment
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
 
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
 
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
 
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
 
MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?MapReduce and parallel DBMSs: friends or foes?
MapReduce and parallel DBMSs: friends or foes?
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...
 
Cloud computing1
Cloud computing1Cloud computing1
Cloud computing1
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed Database
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 

Similar to SPARJA: a Distributed Social Graph Partitioning and Replication Middleware

An Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAn Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAM Publications
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudIJAAS Team
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlesSoundar Msr
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...birdsking
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...Journal For Research
 
B036407011
B036407011B036407011
B036407011theijes
 

Similar to SPARJA: a Distributed Social Graph Partitioning and Replication Middleware (20)

Load Balancing in Cloud Nodes
 Load Balancing in Cloud Nodes Load Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
 
Load Balancing in Cloud Nodes
Load Balancing in Cloud NodesLoad Balancing in Cloud Nodes
Load Balancing in Cloud Nodes
 
An Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud DatabaseAn Algorithm to synchronize the local database with cloud Database
An Algorithm to synchronize the local database with cloud Database
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
p27
p27p27
p27
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
 
B036407011
B036407011B036407011
B036407011
 
Ax34298305
Ax34298305Ax34298305
Ax34298305
 

More from Maria Stylianou

Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksMaria Stylianou
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Maria Stylianou
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Maria Stylianou
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Maria Stylianou
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...Maria Stylianou
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Maria Stylianou
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based SchedulingMaria Stylianou
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesMaria Stylianou
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkMaria Stylianou
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersMaria Stylianou
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your SecretsMaria Stylianou
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services Maria Stylianou
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed SystemsMaria Stylianou
 

More from Maria Stylianou (15)

Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)Scaling Online Social Networks (OSNs)
Scaling Online Social Networks (OSNs)
 
Erlang in 10 minutes
Erlang in 10 minutesErlang in 10 minutes
Erlang in 10 minutes
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee Green Optical Networks with Signal Quality Guarantee
Green Optical Networks with Signal Quality Guarantee
 
Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee Cano projectGreen Optical Networks with Signal Quality Guarantee
Cano projectGreen Optical Networks with Signal Quality Guarantee
 
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
A Survey on Large-Scale Decentralized Storage Systems to be used by Volunteer...
 
Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...Performance Analysis of multithreaded applications based on Hardware Simulati...
Performance Analysis of multithreaded applications based on Hardware Simulati...
 
Automatic Energy-based Scheduling
Automatic Energy-based SchedulingAutomatic Energy-based Scheduling
Automatic Energy-based Scheduling
 
Intelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet ServicesIntelligent Placement of Datacenters for Internet Services
Intelligent Placement of Datacenters for Internet Services
 
Instrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel BenchmarkInstrumenting the MG applicaiton of NAS Parallel Benchmark
Instrumenting the MG applicaiton of NAS Parallel Benchmark
 
Low-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic RegistersLow-Latency Multi-Writer Atomic Registers
Low-Latency Multi-Writer Atomic Registers
 
How Companies Learn Your Secrets
How Companies Learn Your SecretsHow Companies Learn Your Secrets
How Companies Learn Your Secrets
 
EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services EEDC - Why use of REST for Web Services
EEDC - Why use of REST for Web Services
 
EEDC - Distributed Systems
EEDC - Distributed SystemsEEDC - Distributed Systems
EEDC - Distributed Systems
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

SPARJA: a Distributed Social Graph Partitioning and Replication Middleware

  • 1. SPARJA: a Distributed Social Graph Partitioning and Replication Middleware Stylianou Maria KTH Royal Institute of Technology Stockholm, Sweden mariasty@kth.se Girdzijauskas Šar¯unas KTH Royal Institute of Technology Stockholm, Sweden sarunas@sics.se ABSTRACT The rapid growth of Online Social Networks (OSNs) has lead to the necessity of effective and low-cost scalability. Approaches like vertical and horizontal scaling have been proven to be inefficient due to the strong community struc- ture of OSNs. We propose SPARJA, a distributed graph partitioning and replication middleware for scaling OSNs. SPARJA is an extension of SPAR [8] with an improved par- titioning algorithm which functions in a distributed man- ner and eliminates the requirement of a global view. We compare and evaluate SPARJA with a variance of SPAR on synthesized datasets and datasets from Facebook. Our results show that our proposed system is on par with and even outperforms SPAR depending on graphs natures and clusterization. Categories and Subject Descriptors C.4 [Performance of Systems]: Miscellaneous ; D.4.8 [Performance]: Metrics—performance measures General Terms Online Social Networks, Scalability, Partitioning, Replica- tion Keywords Online Social Networks, scalability, partitioning, replication, SPAR, JA-BE-JA 1. INTRODUCTION Recently, there has been an abrupt transition of interest from traditional web applications to social applications and especially to Online Social Networks (OSNs), e.g. Facebook1 and Twitter2 . Both Facebook and Twitter have millions of active users who post, comment and update their status at 1 http://www.facebook.com 2 http://www.twitter.com very high rates. This trend makes OSNs a popular object of analysis and research. Recent research showed that OSNs produce a non-traditional form of workloads [1, 12], mostly because of the different nature of data. Data are highly personalized and intercon- nected due to the strong community structure [5–7]. These new characteristics impose new challenges in terms of OSNs maintenance and scalability. To address scalability, two approaches have been followed so far; the vertical and the horizontal scaling. The former solution, which implies replacements of existing hardware with high-performance servers, tends to be very expensive and sometimes not infeasible because of the very large size of OSNs. The latter solution proposes load partitioning among several cheap commodity servers or virtual machines (VMs), the second of which derives from the emergence of cloud computing systems, e.g. Amazon EC23 and Google Ap- pEngine4 . With this approach, data are partitioned into disjoint components, offering horizontal scaling in low cost. However, problems arise when it comes to OSNs. In OSNs, users can be members of several social communi- ties [5–7], making the partitioning unfeasible. Most of the operations in OSNs concern a user and her friends, who are her neighbors. Thus, if a user belongs to many commu- nities, clean partitioning is apparently impossible. Subse- quently, queries are resolved with high inter-server traffic. An attempt to eliminate this traffic is to replicate all users data in multiple or all the servers. However, this leads to an increased replication overhead which hinders consistency among replicas. SPAR [8], a social partitioning and replication middleware, addresses the problem of partitioning - and consequently scaling - OSNs. However, a global view of the entire network is required and therefore, its use for extremely large-scale OSNs may be defective. In this paper, we present a variant of the SPAR algorithm which uses the partitioning technique proposed in [10]. The new system has an improved - and distributed - partitioning phase which does not require the global view. We evaluate and compare our heuristic with the initial SPAR algorithm, proving that scalability can be 3 http://aws.amazon.com/ec2/ 4 http://cloud.google.com/appengine
  • 2. improved with a distributed approach for partitioning. In the next section, we discuss about related work and the background of our research. In Section 3, we present our contribution and describe the system deployed. Section 4 consists of the evaluation with the experiments conducted and in section 5, our conclusions are listed. 2. BACKGROUND AND RELATED WORK Due to the recent emergence of OSNs, scaling and main- taining such networks constitute a new area of research with limited work so far. In this section, we describe approaches followed in the past and associate them with SPAR and our work. Scaling out web applications is achievable with the use of Cloud providers, like Amazon EC2 and Google AppEngine. Developers have the ability to dynamically add or remove computing resources depending on the workload of their ap- plications. This facility requires the applications to be state- less and the data to be independent and easily sharded into clean partitions. OSNs deal with highly interconnected and dependent data and therefore scaling out is not a scaling solution. Nowadays, Key-Value stores have become the scaling solu- tion for several popular OSNs. Key-Value stores are de- signed to scale with the tradeoff to partition data randomly across the servers. This requirement limits the performance of OSNs. Puyol et al. [8] have proven that SPAR performs better than Key-Value stores. Because of their principle of preserving data locality, they managed to minimize the inter-server traffic and, therefore, improve the performance. Another approach for scaling and maintaining applications is the use of Distributed File Systems. Such systems [4,11] distribute and replicate data for achieving high availability. In the case of OSNs, most of the queries concern data from several users which would imply fetching data from multi- ple servers. SPAR does not follow the same approach as Distributed File Systems, but it replicates data in such a manner that all necessary data can be found locally and more efficiently. SPAR is the initial work and motivation for our research. It is a partitioning and replication algorithm, designed for social applications. SPAR offers transparent scalability [9] by preserving local semantics, i.e. storing all relevant data for a user on one server. Moreover, it aims at minimizing the replication overhead for keeping the overall performance and system efficiency high. SPAR achieves load balancing by acquiring the global view of the network. However, hav- ing access to all data at all times can be very costly and impractical, especially for large-scale systems. Additionally, SPAR has a central partition manager which imposes single point of failure. Both drawbacks are addressed in our imple- mentation which is described in next section. Furthermore, we tackle the possibility of SPAR’s partition manager to fall into a local optima while trying to preserve load balancing. This likelihood may result to an increased replication over- head which we also try to improve in the proposed system. 3. OUR CONTRIBUTION - SPARJA Our main contribution is the implementation of SPARJA, a variant of SPAR which is based on JA-BE-JA [10]. JA- BE-JA is a distributed graph partitioning algorithm that does not require the global view of the system. SPARJA eliminates the single point of failure from the initial SPAR by replacing its main algorithm with JA-BE-JA. It also aims at replication overhead minimization with the execution of a simple straightforward technique. 3.1 System Architecture Figure 1 depicts a three-tier web architecture with SPARJA and it is based on the architecture of SPAR. The application can interact with SPARJA through the Middleware (MW). When the application requests a read or write operation on a user’s data, it calls the MW which locates the back-end server that contains the data of the specific user. The MW sends back to the application the address of the server to initiate a data-store interface, like MySQL, Cassandra or others. Figure 1: SPARJA Architecture 3.2 Description SPARJA is a dynamic gossip-based local search algorithm. Its goal is to group and store connected users, i.e. friends, into the same server. In that way, SPARJA aims to reduce the replication overhead as well as the inter-server traffic. Initially, the system takes as an input a partial graph and partitions it into k equal size components. All components have the same amount of users, thus achieving load balanc- ing. Afterwards, each node behaves as an independent pro- cessing unit which periodically executes the algorithm based on local information about the graph topology. These peri- odical executions are essential for repartitioning the graph and minimizing the replica nodes. Nodes can work in par- allel, nevertheless SPARJA can work as a central system as well. 3.3 System Operations SPARJA is responsible to preserve scalability and trans- parency of the application by distributing users, partitioning
  • 3. the network and creating replicas under certain conditions. Below, we describe operations that SPARJA executes for achieving all its goals. 3.3.1 Data Distribution and Partitioning SPARJA guarantees that users are equally distributed among all servers. When a new user joins the network, a node - called master node - is created and stored in the server with the minimum number of master nodes. Hence, the data distribution is fair and the load balanced. Recalling from SPAR, users may move from one server to another. In con- trast, users in SPARJA may exchange positions, i.e. User A can move to the server of user B, and user B can move to the server of user A, in order to be co-located with their friends. 3.3.2 Data Replication Data Replication is an important function of SPARJA. By replicating master nodes, two requirements are satisfied; lo- cal semantics and fault tolerance. When a new user joins the network, along with the master node, replica nodes are created and stored in servers. The number of replicas is a custom value, initially set before the execution of the al- gorithm and serves for preserving fault tolerance. When a new friendship is established, new replicas may be created, if needed, for data locality. SPARJA attempts to keep the number of replicas to the minimum, by solving the set-cover problem [2]. Particularly, with the creation of additional replicas for data locality, some of the fault tolerance repli- cas may be removed. Listing 1 presents the replication al- gorithm in pseudocode which is executed periodically from each node. The first part of the algorithm guarantees lo- cality by creating replica nodes of a user - if not exist - in the servers of her friends. The second part guarantees fault tolerance by creating additional replica nodes for a user in servers that do not already have the user. 1 f o r user in graph : 2 f r i e n d s = g e t f r i e n d s ( user ) 3 f o r f r i e n d in f r i e n d s : 4 i f server i s not same : 5 i f ! r e p l i c a e x i s t s ( user ) : 6 c r e a t e r e p l i c a ( user ) 7 8 f o r user in graph : 9 r e p l i c a s = g e t r e p l i c a s ( user ) 10 i f num replicas < k r e p l i c a s : 11 new replicas = k r e p l i c a s − e x i s t i n g r e p l i c a s 12 f o r j in range ( new replicas ) : 13 f o r k in range ( t o t a l s e r v e r s ) : 14 i f k i s not master server : 15 i f ! r e p l i c a e x i s t s ( user , k) : 16 c r e a t e r e p l i c a ( user , k) Listing 1: Algorithm for Data Replication in SPARJA 3.3.3 Sampling and Swapping Policies Each node runs the algorithm of SPARJA as a process- ing unit. It periodically selects a node and moves to the swapping policy, which measures the benefit of swapping its server with the server of the sampled node. The benefit of swapping is measured in terms of energy, as introduced in JA-BE-JA [10]. Each node has energy and, therefore, the system has a global energy. The energy becomes low when nodes are placed close to their neighbors. SPARJA uses the same energy function to measure the swapping benefit. If the energy decreases for both nodes - which is the desired be- havior - then the swapping is performed, otherwise it halts. A hybrid node selection policy is followed [10], which con- sists of two parts. Firstly, the node selects - in random - one direct neighbor and calculates the benefits. If the en- ergy function does not improve, then the node performs a random walk and selects another node from its walk [3]. 3.3.4 Simulated Annealing Local search algorithms tend to stuck in local optimas. Simi- larly, SPARJA is vulnerable to this hazard which would lead to higher replication overhead. To address this possibility, we employ the Simulated Annealing technique as described in [13]. Initially, noise is introduced to the system which is analogous to temperature, causing the system to deviate from the optimum value. After an amount of iterations, the system starts to stabilize and eventually concludes to the optimal solution, rather than a local optima. 4. EVALUATION For evaluating SPARJA, we have implemented both SPARJA and SPAR algorithms. Both algorithms are implemented in Python, using Cassandra as the data store. 4.1 Metrics The principal evaluation metric we use is the replication overhead, which consists of the number of replicas created for local semantics and the number of replicas created for fault tolerance. 4.2 Datasets We use six datasets for evaluating the replication overhead; three synthesized datasets and three facebook datasets. Synthesized Datasets We generated three synthesized datasets with different clus- terization levels in order to study the clusterization impact in the replication overhead. All three datasets contain 1000 nodes each with node degree equal to 10. The Randomized graph (Synth-R) has no clusterization policy, while the Clus- tered (Synth-C) and Highly Clustered (Synth-HC) graphs have 75% and 95% of clusterization respectively. Facebook Datasets The three Facebook datasets have been acquired from the Stanford Large Network Dataset Collection 5 , with number of edges approximate to 3000, 6000 and 60000 respectively. All details - including nodes, edges and clusterization levels - can be found in the Table 1. 5 http://snap.stanford.edu/
  • 4. Dataset Nodes Edges Clusterization (%) Synth-R 1000 10,000 0% Synth-C 1000 10,000 75% Synth-HC 1000 10,000 95% Facebook-1 150 3386 n/a Facebook-2 224 6384 n/a Facebook-3 786 60,050 n/a Table 1: Description of Datasets 4.3 Environment Preparation Before conducting any experiments, we set up the cooling rate (δ), i.e. the rate of change of temperature, used in the simulated annealing technique. As stated in [13], the number of iterations is equal to the fraction of temperature difference divided by the cooling rate. number of iterations = To − 1 δ To is the Initial Temperature of the network, which declines according to the cooling rate. Assuming a network with four servers and a fixed value of the Initial Temperature (To) equal to 2, we run the algorithm for different numbers of iterations in order to adjust the cooling rate. The datasets used are the synthesized graphs with 0%, 75% and 95% of clusterization. Figure 2 shows how the number of non lo- cal nodes is adjusted with the increase of iterations. From the number of non local nodes we can deduce how much replication overhead is caused and how well the clusteriza- tion is done. As illustrated, the number of non local nodes decreases with the increase of iterations, and eventually sta- bilizes at 200 iterations for the randomized graph and at 300 iterations for both clusterized and highly clusterized graphs. Figure 2: Number of Iterations vs Number of Non Local Nodes In Table 2, we accumulate all the parameters with fixed values set before realising the experiments. 4.4 Experiments In our experiments we compare SPARJA and SPAR in terms of the replication overhead. We designed three scenarios Parameter Value Initial Temperature (To) 2 Final Temperature (T) 1 Cooling Factor (δ) 0.003 Energy Function Parameter (n) 2 Table 2: Parameters for SPARJA for testing the impact of different datasets, fault tolerance replication and amount of servers. 4.4.1 Replication Overhead on Different Datasets In the first experiment, we study how different datasets and topologies affect the replication overhead of the system. Fig- ures 3 and 4 show the replication overhead of both SPAR and SPARJA in synthesized graphs and facebook graphs re- spectively. The amount of servers is set to four (S=4) and the replication factor for fault tolerance is set to zero (K=0). As revealed in Figure 3, a higher clusterization leads to lower replication overhead in SPARJA. As expected, SPARJA takes advantage of the existing graph clusterization and continues redistributing nodes based on this divided topology. As a result, SPARJA outperforms SPAR in the clustered graphs while giving worse results than SPAR in the random graph. Similarly, Figure 4 shows how SPARJA and SPAR perform on facebook graphs. Again, SPARJA gives better results as compared to SPAR. Figure 3: Replication Overhead on Synthesized Datasets Figure 4: Replication Overhead on Facebook Datasets 4.4.2 Replication Overhead vs Replication Factor Next, we turn our attention to the replication factor and whether it affects the replication overhead. Figure 5 shows
  • 5. the replication overhead of SPARJA in all datasets with replication factor for fault tolerance set to zero (K=0) and two (K=2). The amount of servers is set to four (S=4). As can be seen, the fault tolerance replication factor can dramatically affect and decrease the replication overhead. As it was expected, fault tolerance replica nodes are also used for preserving data locality. Figure 5: Replication Overhead for Different Num- ber Fault Tolerance Replicas 4.4.3 Replication Overhead with Different Number of Servers In our final experiment we measure the replication overhead of both SPARJA and SPAR for different number of servers S=4,8,16. The fault tolerance replication factor is set to 2 (K=2) and all datasets are used. In Figure 6 we plot the results collected from the algorithms, divided in 6 graphs, each one for a different dataset. As expected in all datasets, the replication overhead increases with the increase of the number of servers. 5. CONCLUSIONS Online Social Networks have faced a steep growth over the last decade. This popularity has lead companies to study the nature of OSNs and offer scalability and maintenance ser- vices. However, none of the scalability approaches, proposed so far, has solved all the scalability issues. The strong com- munity structure of such systems makes Key-Value stores and relational databases inefficient. We proposed SPARJA, a distributed graph partitioning and replication middleware for scaling OSNs. SPARJA parti- tions the graph into k balanced components and maintains them without obtaining the global view of the system. It relies on data replication for preserving fault tolerance and locality semantics, while aiming to keep the replication over- head as low as possible. The evaluation of SPARJA was accomplished using synthe- sized graphs as well as real datasets from facebook. Our comparisons with SPAR showed that SPARJA offers signif- icant gains in replication overhead, especially when there is clusterization of the graph. Moreover, with the low replica- tion overhead, it covers both goals of locality semantics and fault tolerance. We implemented and tested an initial version of SPARJA. We leave the integration of SPARJA in a real system with a three-tier architecture as a future work. 6. ACKNOWLEDGEMENTS We would like to thank our colleague Muhammad Anis ud- din Nasir for his valuable contribution and help in the project. We also thank Fatemeh Rahimian for providing sources for datasets used for the evaluation part of the project. 7. REFERENCES [1] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, pages 49–62. ACM, 2009. [2] R. Carr, S. Doddi, G. Konjevod, and M. Marathe. On the red-blue set cover problem. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 345–353. Society for Industrial and Applied Mathematics, 2000. [3] M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In INFOCOM, 2010 Proceedings IEEE, pages 1–9. IEEE, 2010. [4] R. Guy, J. Heidemann, W. Mak, T. Page Jr, G. Popek, D. Rothmeier, et al. Implementation of the ficus replicated file system. In USENIX Conference Proceedings, volume 74, pages 63–71. Citeseer, 1990. [5] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29–123, 2009. [6] M. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006. [7] M. Newman and J. Park. Why social networks are different from other types of networks. Physical Review E, 68(3):036122, 2003. [8] J. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine (s) that could: scaling online social networks. In ACM SIGCOMM Computer Communication Review, volume 40, pages 375–386. ACM, 2010. [9] J. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling online social networks without pains. In Proc of NETDB. Citeseer, 2009. [10] F. Rahimian, A. H. Payberah, S. Girdzijauskas, M. Jelasity, and S. Haridi. JA-BE-JA: a distributed algorithm for balanced graph partitioning. forthcoming. [11] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki, E. Siegel, and D. Steere. Coda: A highly available file system for a distributed workstation environment. Computers, IEEE Transactions on, 39(4):447–459, 1990. [12] F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger. Understanding online social network usage from a network perspective. In Proceedings of
  • 6. Figure 6: Replication Overhead with Different Number of Servers the 9th ACM SIGCOMM conference on Internet measurement conference, pages 35–48. ACM, 2009. [13] E. Talbi. Metaheuristics: from design to implementation. 2009.