The document presents a new greedy incremental approach for community detection in social networks. It begins by calculating the degree of nodes and sorting them in descending order. Initial communities are formed with the highest degree nodes. Then nodes are incrementally added to communities if it increases the community density. The approach is tested on standard datasets and able to detect communities reasonably well in less dense graphs. However, there is scope to improve performance on very dense graphs such as implementing it in parallel processing.
Greedy Incremental approach for unfolding of communities in massive networks
1. Greedy Incremental approach for unfolding of
communities in massive networks
Mr. Kamal Sutaria Dr. K.H.Wandra Dr.C.K.Bhensdadia Ms Kruti Khalpada Mr.Dipesh Joshi
Research Scholar,
Computer
Engineering Dept,
C.U.SHAH
UNIVERSITY,
Wadhwan, Gujarat
Professor,
Computer
Science Engineering
BITS Education
Campus, Vadodara
Gujarat
Professor,
Computer
Engineering Dept.,
D.D.University
Nadiad,
Gujarat
Assistant Professor,
Computer Engineering
Dept,
Atmiya Institute of
Science &
Technology,Rajkot,Guja
rat
Assistant Professor,
Computer
Engineering Dept,
V.V.P. Engineering
College,Rajkot,Guj
arat
Kamal.sutaria
@gmail.com
Khwandra
@rediffmail.com
Ckbhensdadia
@ddu.ac.in
Krutikhalpada
@yahoo.com
ddipesh4
@gmail.com
Abstract-Social Network Mining has been an area of interesting research due to billions of people using social media.
Community detection is identified as one of the major issues of a social network. Here, a new approach has been
presented for community detection which is greedy as well as incremental in nature. The approach is tested on standard
datasets and the results are presented as well as analyzed.
I. INTRODUCTION
Social media covers the huge platform in the world of internet. It ranges from various blogs to forums to media
sharing and social networking and still, the list continues. With the increase in the social media usage, there is also a
huge increase in the social network usage. That’s why; it has become an interesting topic for researchers around the
world. Social Network is the group of entities in the social media. The social network is made up of different
communities. A community is the set of nodes having links with each other. For an example, a Facebook website
can be called a social network. A group on the Facebook can be called a community. These data related to the social
network are so dynamic and huge. So it becomes important to analyze those data and find some useful information.
Social Network Analysis is the process of demonstrating, exploring and mining meaningful patterns from the social
media data [1]. There are so many interesting applications of social network analysis such as detecting trends in
news, community detection in a social network, brand monitoring for marketing, election results prediction in
politics and so on. These applications make Social network analysis a very interesting area of research.
A. Organization of paper
The next section describes the research work done for the community detection in the social network. Some of the
issues related to the community detection are also discussed in that section. Section III presents the new approach
for community detection. Section IV represents the small example to understand the algorithm and the Section V
presents the results we have derived through the implementation of the proposed system. Section VI covers the
conclusion and the future work. Section VII lists the references which are used for the literature survey.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
257 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. II. RELATED WORK
Here, in this section, the research work in the area of social network mining and especially, community detection
has been discussed. In [2], the fundamental concepts are very well explained like basic graph properties, elements of
the graph, how the graph can be used to represent the social network and how partitioning can be performed. If
graphs can be represented using graphs, we can apply all the traditional graph algorithms on it. For simplicity of
operation, the social network is represented in the form of an adjacency matrix. In [3], they discuss the components
of community detection algorithms. According to [3], there are two basic components of any community detection
algorithm; one is the algorithm for detecting communities and the second is the dynamic programming algorithm
which selects small communities to be combined into the large one. The social network is highly dynamic even in
the form of its connections. With time, the connections might get updated. Due to this, the community detection
algorithm has to be applied regularly after some specific time period. Swarm intelligence algorithms are also used
by the researchers for the community detection. In [4], they have used ACO (Ant Colony Optimization) algorithm
for community detection. Swarm intelligence algorithms are based on the real world behavior of insects or animals.
ACO is specially designed based on the behavior of ants. While searching for food, ant lay one chemical named
pheromone on the path so that the other ants can detect the presence of it can follow it in the future to find the
nearest food source. In [5], communities are detected using local neighborhood. Here, the approach does not work
for detecting overlapping communities. As the social networks can be represented using graphs, the traditional graph
algorithms can be used for social networks also. In [6], the spanning tree based community detection method using
max-min modularity has been presented.
III. PROPOSED APPROACH FOR COMMUNITY DETECTION
Here, the GICD (Greedy Incremental approach for Community Detection) approach for detecting communities
from a social network is discussed.
The input to the algorithm is the graph adjacency list. The output is the generated communities.
As the algorithm uses the local best choice for putting a particular node into the community, it can be termed
greedy. The communities are formed one by one. Initially, there are n number of communities where n is the number
of nodes. Eventually, nodes are added to the suitable group. So it is incremental also.
Following is the procedure GICD.
1. Calculate degree of all the nodes and find the average degree Davg
2. Arrange the nodes in the descending order of their degree
3. Unmark all the nodes
4. Consider single/all the node/(s) with the highest degree
5. x=1
6. IC = {Form initial communities same as the number of nodes with the highest degree (n)}, mark those
nodes, IC = Initial community
7. Do
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
258 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. a. i= xth
node from IC (node number of IC)
i. For j = 1 to m (number of adjust nodes of I )
If degree (jth
node) >= Davg && degree (jth
node ) < degree(i) then
a. Add this node in the community Ci
b. Mark the node
Else if degree (jth
node) = = degree (i)
a. Check whether the density of the temporary community is increased by
adding the jth
node or not
b. If density increases, add the node to community Ci,
c. Mark the node
Else
a. Put it in the IC
ii. Remove the node i from the IC
iii. x=x+1
While IC is empty
In the first step, we calculate the degree of all the nodes in the given graph. The degree of the node is the number
of outgoing edges connected with the particular node. So if a node is connected with other three nodes, the degree of
the node is three. In the second step, the nodes are arranged in descending order of their degrees. As the strategy is a
greedy strategy, the community formation will be started from the highest degree node. The third step is to unmark
all the nodes so that we can track which nodes are already considered and which are not. In the fourth step, the
highest degree node (or nodes) is (/are) chosen and the initial community is formed with only one node within it. In
the fifth step, the variable x is initialized to 1, which is used to keep track of the node number in a process and works
as a counter. In the sixth step, we initialize the variable IC (Initial Community) same as the number of nodes. In the
seventh step, the process of selecting the appropriate community for a particular node is done based on the
parameter density [2]. By adding a particular node to a community, if the density increases, the node is included in
the community; otherwise, it is not. The selected node is then removed from the IC. The step number seven is
continued until all the nodes are processed and the list IC is empty.
IV. ALGORITHM EXAMPLE
Here, the example shows the working of the GICD algorithm. For simplification of the example, the small
examples are selected.
Figure 1. Example
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
259 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. The above example shows the example graph with thirteen nodes. The edges show the interconnection between
the nodes. The following is the result of how the algorithm works in the above example. The following table shows
the degree of all the nodes.
TABLE I
SAMPLE DATASET
Sr. No. Node Degree Adjacent Nodes
1 A 4 {B,C,D,E}
2 B 3 {A,C,D}
3 C 4 {A,B,D,E}
4 D 5 {A,B,C,E,F}
5 E 4 {A,B,C,D}
6 F 2 {D,G}
7 G 2 {F,H}
8 H 4 {G,I,J,K}
9 I 5 {H,J,K,L,M}
10 J 4 {H,I,L,M}
11 K 4 {H,I,L,M}
12 L 4 {I,J,K,M}
13 M 4 {I,J,K,L}
Following is the outcome generated by the algorithm. It detects three communities for the given example which is
but natural.
Figure 2. Output of Sample Dataset
V. RESULTS
TABLE II
IMPLEMENTATION PLATFORM DETAILS
Sr. No Resource Name Specification
1 Compiler Python (3.5.2)
2 CPU Intel Core i3-4005U CPU @ 1.70 GHz 1.70 GHz
3 RAM 4GB
4 OS Ubuntu 16.04
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
260 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. The results are taken for the 13 different datasets which are presented in the following table. For each dataset, no
of nodes and no of edges are specified. In the last two columns, it shows time taken by proposed approach as well
as a number of communities detected by the algorithm.
TABLE III
DATASET INFORMATION
Sr.
No
Dataset Name No of nodes No of edges Time No of Communities
1 SAWMILLE 36 62 0.04 1
2 KARATE 34 78 0.16 5
3 MEXICAN DATA 35 117 0.17 2
4 DOLPHINS 62 159 0.18 6
5 POLLBOOK 105 441 0.23 5
6 FOOTBALL 115 1232 0.47 14
7 CELEGANS METABOLIC 453 4596 118 14
8 JAZZ 198 5484 8 3
9 EMAIL 1133 10903 6.8 2
10 EMAIL-EU-CORE 1005 25571 399 20
11 P2PGNUTELLA04 10876 39994 232 29
12 CA-HEPTH 9877 51971 387 515
13 CA-CONDMAT 23133 186936 4899 672
VI. CONCLUSION & FUTURE SCOPE
With this, the new Greedy incremental approach for community detection has been presented implemented and
tested on various standard data sets and other random example sets. The outcome produced by the approach is the
number of communities. The result is compared with the well-known results available. For the graphs which are not
too dense, this algorithm performs nearer to the standard number of communities. From that, we can conclude that
GICD cannot work well with a too much dense graph so there is a scope for improvement there. Apart from this, as
the social network data is tremendously huge, one can go ahead with the parallel implementation of the same.
ACKNOWLEDGMENT
The authors very much appreciate the financial and infrastructure supports by the V.V.P. Engineering College,
Rajkot.
REFERENCES
[1] Yiannis Kompatsiaris, (2013), “Social Networks Mining for Innovative Applications and Users Well-Being”.
[2] Fortunato, Santo "Community detection in graphs." Physics Reports 486.3 (2010): 75-174.
[3] Nguyen, Nam P., et al. "Adaptive algorithms for detecting community structure in dynamic social networks." INFOCOM, 2011 Proceedings
IEEE. IEEE, 2011.
[4] Sadi, Sercan, Şima Etaner-Uyar, and Şule Gündüz-Öğüdücü.( 2009) "Community detection using ant colony optimization techniques.”
[5] Justine Eustace, Xingyuan Wang and Yaozu Cui “Community detection using local neighborhood in complex networks” in Faculty of
Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China
[6] Ranjan Kumar Behera, S. K. Rath and Monalisa Jena “Spanning Tree Based Community Detection using Min-Max Modularity” in 6th
International Conference On Advances In Computing & Communications, ICACC 2016, 6-8 September 2016, Cochin, India.
[7] Lancichinetti, Andrea, and Santo Fortunato. "Community detection algorithms: a comparative analysis." Physical review E 80.5 (2009):
056117.
[8] Ke Xu, Xinfang Zhang ,Mining Community in Mobile Social Network, Procedia Engineering 29 (2012) 3080 – 3084
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
261 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. [9] Sadi, Sercan, Sima Etaner-Uyar, and Sule Gündüz-Öğüdücü.( 2009) "Community detection using ant colony optimization techniques."
Proc. Int. Conf. Soft Computing (MENDEL’09).
[10] Michalis Vazirgiannis, Christos Giatsidis and Fragkiskos D. Malliaros “Graph Mining Tools for Community Detection and Evaluation in
Social Networks and the Web” in 22nd International World Wide Web Conference (WWW), May 13-17, 2013 | Rio de Janeiro, Brazil
[11] Guy, I., Avraham, U., Carmel, D., Ur, S., Jacovi, M., & Ronen, I. (2013, May). Mining expertise and interests from social media.
InProceedings of the 22nd international conference on World Wide Web (pp. 515-526). International World Wide Web Conferences
Steering Committee.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
262 https://sites.google.com/site/ijcsis/
ISSN 1947-5500