International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Bi4201403406
1. N. A. Pansare et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406
RESEARCH ARTICLE
www.ijera.com
OPEN ACCESS
Association Rule Mining in Distributed Environment
Mrs. V. C. Kulloli (Guide) 1, O.A. Omble2, V.G. Gadle3, Y. G. Potdar4, N. A.
Pansare5,
1
Asst. Prof in IT PCCOE
B.E.I.T PCCOE
2,3,4,5
Abstract
Association rule mining is an important term in data mining. Association rule mining generates important rules
from the data. These rules are called frequent rules and the whole concept is known as frequent rule mining.
Earlier this technique was used to be implemented at local machines to generate rule. But when the data size
increases as transaction on data increases then local machines took large time to compute the frequent rules. To
reduce the time, local machines started upgrading their machine with higher configuration like expanding RAM
or Hard-Disk etc. In our paper we propose a technique in which the data is divided between the machines and
each machine compute the frequent rules based the data which is given to them after division of data. This
technique is related to distributed data mining. Existing framework such as IDMA, EMADS, suffers the
communication overhead. In this paper, the proposed framework will attempt to reduce the communication
overhead. For providing more security against unauthorized clients, we are using RC4 algorithm for encryption
and decryption of messages which is going to be passed between the clients.
Index Terms— FI-Mining, Association Rule Mining, Distributed Data Mining, Intelligent Agent Based
Mining
I. INTRODUCTION
Data mining is a very old concept in research
area. Since the development in computer science the
roots of data mining are going deeper and deeper.
There are many tools available for extracting
information from the data. One of the techniques is
known as Association Rule Mining introduced by
Agrawal et al [1]. It is highly used for generating
frequent pattern set and most popular algorithm is
Apriori algorithm. But the problem with association
rule mining is, it consumes large time for the large
data set. Even a large disk space is required for the
large data set. Distributed data is one way in which
minimizes this problem can be minimized. In
distributed data base the data is divided in many parts
and each part is saved on different machines. Now
each machine is used to mine the important data from
the data set which is given to them machine after
division of data. Mining rules in distributed data is
known as distributed data mining (DDM). A large
amount of time is saved using distributed data mining.
Performing association rule mining in distributed data
mining is called distributed association rule mining.
Our paper is totally based on distributed association
rule mining and reduces the communication between
the
distributed
machines
means
reduces
communication overhead.
We are attempting to produce secure
transmission of messages between different Clients.
For that we are using s a encryption and decryption of
www.ijera.com
messages. The algorithm we are using for encryption
and decryption is RC4 algorithm [12]. The client will
send a message to another client. Server receives the
message and encrypts the message. Server then sends
this encrypted message to the client which was
supposed to receive the message. After receiving the
message, client will decrypt the message in readable
form.
II. RELATED WORK
There are some tools already present in the
field of distributed data mining. Some of them are
mentioned below.
IDMA [9] architecture shows mobile agent
based distributed and incremental association rule
mining. The system includes the distributed
knowledge discovery management system (KDMS),
the knowledge discovery sub-system (sub-KDS), the
data mining mobile agent (DMMA) and the local
knowledge base (LKB). The KDMS dispatches the
mobile agent DMMA to each site. The mobile agents
move to the sub-KDS and execute the mission of data
mining. The local large item set scan be got so the
local association rules can be obtained and the local
knowledge base can be refreshed. The set of local
large item sets and their support counts led back to
the KDMS by the mobile agents. When all the mobile
agents come back to KDMS, the possible minimum
and maximum support counts of the potential global
403 | P a g e
2. N. A. Pansare et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406
item sets can be got. This system was implemented
based on IBM Aglet.
An Extendible Multi-Agent Data mining
System (EMADS) [8] framework promotes the ideas
of high-availability and high performance without
compromising data or DM algorithm integrity. This
framework provides a highly flexible and extendible
data-mining platform. The resulting system allows
users to build collaborative DM approaches. The
proposed framework has been applied to a number of
DM scenarios: Meta association rule mining (Meta
ARM) and classifier generation.
Our distributed association rule mining
frameworks attempt to integrate global knowledge
after the local mining. This obviously initiates several
research problems: Reducing high communication
cost, handling multiple heterogeneous data sources,
improve the efficiency of incremental knowledge
integration, scalability of the framework, data privacy
& security, fault Tolerance of EMADS and efficient
Data Partitioning
III. PROPOSED FRAMEWORK
This section describes the working of our
proposed framework.
A client server system is made in which all
clients are registered with server. The data is divided
between all clients. Association rule mining algorithm
is present at each client system but not in server
system as it was in earlier mentioned tools. If a client
system is requires frequent item sets then that client
generate its request and send it to server. Now server
will ask each client to apply their association rule
mining for generating frequent rules. Client systems
generate there rules and send it to server. The rules
generated at client side are call local rules. Now server
will add these local rules and make a global rule
which is called global frequent item set.
www.ijera.com
Algorithm 1: Routine K-Server
1. function K-Server (minsup,ls)
2. {
3.
min_sup=minsup;
4.
key=genkey();
5.
GFIL=ø;
6.
LS=ls;
7.
visit=true;
8.
if (visit) then
9.
{
10.
AG=MAGen(LS,min_sup,key,GFIL,visit);
11.
Dispatch(AG,LS.next());
12.
}
13.
else
14.
{
15.
AG=Receive(AG);
16.
GenAsso(GFIL);
17.
}
18. }
Algorithm 2: Routine SA
1. function SA(ls,ms,k,fil,v)
2. {
3.
If (v)
4.
{
5.
If (key==k)
6.
{
7.
Find FI ;
8.
Update GFIL;
9.
If (ls-id.next()==k-server)
10.
{
11.
v=false;
12.
}
13.
Dispatch(AG,ls.next());
14.
}
15.
}
16. }
Below is the proposed architecture diagram
of our framework.
Figure 1: Client-Server based DDM Process
This global frequent item set is then passes to
each client system by server and hence each client is
aware of total rules present in the actual data set.
Below is the proposed algorithm for our proposed
framework.
Figure 2: MAD-ARM Model
www.ijera.com
404 | P a g e
3. N. A. Pansare et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406
SA : Stationary Agent (Clients)
MA : Mobile Agent (Message Sent to each client)
LS : Local List (Generated by each client)
GFIL: Global Frequent Item set (Generated by
adding all
LS)
IV. SUPPORT AND CONFIDENCE
PHENOMENON
Any given association rule has a support
level and a confidence level. If the percentage of the
population in which the antecedent is satisfied is s,
then the confidence is that the percentage in which
the consequent is also satisfied. Every association
rule has a support and confidence.The support is the
percentage of transactions that demonstrate the
rule.An itemset is called frequent if its support is
equal or greater than an agreed upon minimal value
the support threshold.The confidence is the
conditional probability that, given X present in a
transition , Y will also be present.
Confidence measure, by definition:
Confidence(X=>Y) equals support(X,Y) /
support(X)
If Client 1 wants to send important message
to another Client2 then Client 1 will request to server
that he/she wants to send a secure message to Client2.
Server will accept the request and encrypt the
message of Client 1 using RC4 algorithm. The
encrypted message and a key are then sent to server.
Client 2 receives the encrypted message from the
server. Client 2 decrypts the message in readable form
using RC4 algorithm with the provided key by server.
VI. CONCLUSION
In this paper we present the overview of
Association rule mining in distributed environment or
called Distributed Association rule Mining and points
out the issues in existing systems. Our approach has
minimized the communication between the different
system used for association rule mining. We had
added a secure message transfer between different
clients using RC4 algorithm to provide more security
for unauthorized Clients.
REFERENCES
[1].
V. RC4 ALGORITHM
RC4 is recognized as the most commonly
utilized stream cipher in the world of cryptography. It
is also acknowledged with two other names such as
the ARC4 and ARCFOUR, which means Alleged
RC4.The person responsible behind the creation of
the RC4 is no other than Ronald Rivest of RSA Data
Security Inc. Based on how it was created, RC4 had
the exact function as a shared key stream cipher
algorithm that entails a highly-secured transfer of a
specific shared key.RC4 has a use in both encryption
and decryption while the data stream undergoes XOR
together with a series of generated keys. It takes in
keys of random lengths and this is known as a
producer of pseudo arbitrary numbers. The output is
then XORed together with the stream of data in order
to generate a newly-encrypted data. Hence, a
particular RC4 key should never be utilized again
when encrypting two other data streams.
[2].
[3].
[4].
[5].
[6].
[7].
[8].
www.ijera.com
www.ijera.com
R. Agrawal, T. Imielinski, and A. Swami,
“Mining Associations between Sets of Items
in Massive Databases,” Proceedings of the
ACM SIGMOD, Washington DC, 1993.
Jaturon Chattratichat, John Darlington,
Moustafa Ghanem, and et. al, “Large Scale
Data Mining: challenges and Responses”,
Proceedings of the 3th International
Conference on Knowledge Discovery and
Data Mining, 1997.
Rakesh Agrawal and John C. Shafer,
“Parallel Mining of Association Rules”,
IEEE Transactions on Knowledge and Data
Engineering, 1996.
Matthias Klusch, Stefano Lodi and Gianluca
Moro, “Agent based distributed data mining:
The KDEC Scheme”.
A.O.Ogunde, O.Folorunso, A.S.Sodiya and
G.O.Ogunieye, “ A review of some issues
and challenges in current agent based
distributed association rule mining”, Asian
Journal of Information Technology, 2011.
E.I. Ariwa, M.B.Senousy and M.M.Medhat,
“Information and
E-business
model
application for distributed data mining using
mobile agents”, Proceedings of the
international conference WWW/Internet,
USA,2003.
G.S.Bhamra, A.K.Verma and R.B.Patel,
”Agent Enriched Distributed Association
Rule Mining: A Review”. Springer Verlag
Berlin Heidelberg, 2012.
Kamal Ali Albashiri, FransCoenen, and Paul
Leng, “An investigation into the issues of
405 | P a g e
4. N. A. Pansare et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406
Multi-Agent Data Mining”Ph.D-Thesis
2010.
[9]. Yun-Lan Wang, Zeng-Zhi Li and Hai-Ping
Zhu, “Mobile Agent Based Distributed and
Incremental Techniques for Association
Rules”. In Proceeding of the Second
International Conference on Machine
Learning and Cybernetics”, 2003.
[10]. U.P.Kulkarni, P.D.Desai, Tanveer Ahmed,
J.V.Vadavi and A.R. Yardi, “Mobile Agent
www.ijera.com
www.ijera.com
Based Distributed Data Mining”, ICCIMA,
2007.
[11]. WalidAdlyAtteya,
KeshavDahal
and
M.AlamgirHossain, “Distributed BitTable
multi-agent Association Rules Mining
Algorithm”, Springer-Verlag, KES 2011,
Part I, LNAI 6881.
[12]. Quentin,
Galvane,
Baptiste,
Uzel
”Cryptography-RC4 Algorithm” February
18,2012.
406 | P a g e