SlideShare uma empresa Scribd logo
1 de 14
Efficient Frequent Pattern Mining In
Distributed Systems
Content
1. Abstract
2. Introduction
3. Literature Survey
4. Work Done Till Now
5. Block Diagram
6. Scope Of The Project
7. References
Abstract
Data Mining the domain of our project , is a newly developed sub-
field of computer science engineering , it is the analysis step of
Knowledge discovery in databases(KDD ) process and is used for
extraction of data from a huge data set and make it understandable for
further use. Among the Six classes of data mining our choice of
interest and our project area is the Association Rule Mining. We will
be applying this class of data mining in an efficient and frequent
pattern for the mining of knowledge or data from Distributed System ,
which can be explained as a collection of set of computers that act ,
work and appear as one large computer.
Introduction
Progress in digital data acquisition, distribution, retrieval and
storage technology has resulted in the growth of massive
databases. One of the greatest challenges facing organizations
and individuals is how to turn their rapidly expanding data
collections into accessible, and actionable knowledge.
Distributed Systems are collections of computers that act and
work together and appear as a large super system with a huge
processing speed.
The association rule mining , which is one of the six classes of
Data mining, is our area of project and is a solution to the
above problem. The general form of Association Rule Mining
is :
X1,X2,X3,…..,Xn->Y
Which implies that all attributes X1,X2,..,Xn predict Y.
The association rule mining algorithm is given as below:
» Input: D, ,
» Output: R(D, , )
» 1: Compute F(D, )
» 2: R := {}
» 3: for all I 2 F do
» 4: R := R [ I ) {}
» 5: C1 := {{i} | i 2 I};
» 6: k := 1;
» 7: while Ck 6= {} do
» 8: // Extract all heads of confident association rules
» 9: Hk := {X 2 Ck | confidence(I  X ) X,D) }
» 10: // Generate new candidate heads
» 11: for all X, Y 2 Hk,X[i] = Y [i] for 1 i k−1, and X[k] < Y [k] do
» 12: I = X [ {Y [k]}
» 13: if 8J I, |J| = k : J 2 Hk then
» 14: Ck+1 := Ck+1 [ I
» 15: end if
» 16: end for
» 17: k++
» 18: end while
» 19: // Cumulate all association rules
» 20: R := R [ {I  X ) X | X 2 H1 [ · · · [ Hk}
» 21: end
LITERATURE SURVEY
» Frequent pattern mining has been a focused theme in
data mining research for over a decade.
» Abundant literature has been dedicated to this research
and tremendous progress has been made till now.
» It ranges from efficient and scalable algorithms for
frequent itemset mining in transaction databases to
numerous research frontiers, such as sequential pattern
mining, structured pattern mining , correlation
mining, associative classification, and frequent pattern-
based clustering, as well as their broad applications.
» Till date there had been a huge literature present for this
research topic, some of the IEEE papers which we have
gone through , we are naming a few of those paper’s
below :
1. Efficient and scalable methods for mining frequent
patterns.
2.Mining interesting frequent patterns.
3. Impact to data analysis and mining applications.
4.Applications of frequent patterns and Research
Directions.
Work Done Till Now
In this part of the presentation , we will put a light on the
various research works that have been done till now on the
entitled project and will be naming a few of them in our
presentation.
1 . A Fast Algorithm for Mining Association Rules
Title of paper: A Fast Algorithm for Mining Association Rules
Author : Rakesh agarwal and Ramakrishna Srikant Year of
Publication: 1997
2. Mining Frequent Patterns without Candidate Generation
Title of paper: Mining Frequent Patterns without Candidate
Generation
Author : Jiwei Han, Jian Pei, Yiwen Yin
Year of Publication: 1997
3. Improved Association Rule Mining Algorithim for large dataset.
Title of the project: Improved association rule mining for large dataset
.
Author: Tanu Arora , Rahul Yadav
Year of Publication : 2011
Block Diagram
1. General working of Data Mining.
2. Knowledge Discovery in Databases Process (KDD)
3. Distributed Systems :
Future Work
The prescribed work is implemented in a local area network,
which can be extended to WAN as a future work.
An improvement could be made in the efficiency of the
system when number of computers are increased in the
distributed system.
We can also improve the efficiency of the algorithm when
large Data Sets are given as input files to the tool.
References
1. R. Agarwal, C.Faloutsos, and A.Swami, “Efficient
Similarity Search in Sequence Databases, “Proc. Fourth
Int’l Conf. foundations of data organization and Algorithm,
Oct 1993
2. Data Mining and concepts, Morgan Kaufmann
publishers,2006,2nd edition By-Han and Kamber
3. Data mining techniques, University press, 2011,2nd
edition By-Arun K.Pujari
4. R.Agrawal, T.Imielinski, and A.Swami, “ Database
Mining: A performance perspective “IEEE Trans.
Knowledge nnd Dada Engineering, vol.5 ,pp. 914.
5. Software Engineering, Pearson Education, 2007
Efficient frequent pattern mining in distributed system

Mais conteúdo relacionado

Mais procurados

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Poster-SetCoverAlgorithm
Poster-SetCoverAlgorithmPoster-SetCoverAlgorithm
Poster-SetCoverAlgorithm
Divya Jain
 

Mais procurados (20)

Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Using parallel hierarchical clustering to
Using parallel hierarchical clustering toUsing parallel hierarchical clustering to
Using parallel hierarchical clustering to
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Poster-SetCoverAlgorithm
Poster-SetCoverAlgorithmPoster-SetCoverAlgorithm
Poster-SetCoverAlgorithm
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science Platform
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
GreenLight Data Collection Architecture
GreenLight Data Collection ArchitectureGreenLight Data Collection Architecture
GreenLight Data Collection Architecture
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
A Benchmark for Simulated Manipulation
A Benchmark for Simulated ManipulationA Benchmark for Simulated Manipulation
A Benchmark for Simulated Manipulation
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
Automation chapt 3
Automation chapt 3Automation chapt 3
Automation chapt 3
 

Destaque

Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 

Destaque (20)

Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
 
Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Frequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigDataFrequent Itemset Mining(FIM) on BigData
Frequent Itemset Mining(FIM) on BigData
 
A vertical representation in frequent item set mining
A vertical representation in frequent item set miningA vertical representation in frequent item set mining
A vertical representation in frequent item set mining
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data mining
Data miningData mining
Data mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 

Semelhante a Efficient frequent pattern mining in distributed system

Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
BRNSSPublicationHubI
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data mining
Dr.Manmohan Singh
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
SK Ahammad Fahad
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
Ninad Samel
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
IJERA Editor
 
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
BRNSSPublicationHubI
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
Harsh Kevadia
 

Semelhante a Efficient frequent pattern mining in distributed system (20)

Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data mining
 
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...Multi-threaded approach in generating frequent itemset of Apriori algorithm b...
Multi-threaded approach in generating frequent itemset of Apriori algorithm b...
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
Hadoop Map-Reduce To Generate Frequent Item Set on Large Datasets Using Impro...
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle
 
0912f50eedb48e44d7000000
0912f50eedb48e44d70000000912f50eedb48e44d7000000
0912f50eedb48e44d7000000
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Managing data in computational edge clouds
Managing data in computational edge cloudsManaging data in computational edge clouds
Managing data in computational edge clouds
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 

Último

Último (20)

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Efficient frequent pattern mining in distributed system

  • 1. Efficient Frequent Pattern Mining In Distributed Systems
  • 2. Content 1. Abstract 2. Introduction 3. Literature Survey 4. Work Done Till Now 5. Block Diagram 6. Scope Of The Project 7. References
  • 3. Abstract Data Mining the domain of our project , is a newly developed sub- field of computer science engineering , it is the analysis step of Knowledge discovery in databases(KDD ) process and is used for extraction of data from a huge data set and make it understandable for further use. Among the Six classes of data mining our choice of interest and our project area is the Association Rule Mining. We will be applying this class of data mining in an efficient and frequent pattern for the mining of knowledge or data from Distributed System , which can be explained as a collection of set of computers that act , work and appear as one large computer.
  • 4. Introduction Progress in digital data acquisition, distribution, retrieval and storage technology has resulted in the growth of massive databases. One of the greatest challenges facing organizations and individuals is how to turn their rapidly expanding data collections into accessible, and actionable knowledge. Distributed Systems are collections of computers that act and work together and appear as a large super system with a huge processing speed. The association rule mining , which is one of the six classes of Data mining, is our area of project and is a solution to the above problem. The general form of Association Rule Mining is : X1,X2,X3,…..,Xn->Y Which implies that all attributes X1,X2,..,Xn predict Y.
  • 5. The association rule mining algorithm is given as below: » Input: D, , » Output: R(D, , ) » 1: Compute F(D, ) » 2: R := {} » 3: for all I 2 F do » 4: R := R [ I ) {} » 5: C1 := {{i} | i 2 I}; » 6: k := 1; » 7: while Ck 6= {} do » 8: // Extract all heads of confident association rules » 9: Hk := {X 2 Ck | confidence(I X ) X,D) } » 10: // Generate new candidate heads » 11: for all X, Y 2 Hk,X[i] = Y [i] for 1 i k−1, and X[k] < Y [k] do » 12: I = X [ {Y [k]} » 13: if 8J I, |J| = k : J 2 Hk then » 14: Ck+1 := Ck+1 [ I » 15: end if » 16: end for » 17: k++ » 18: end while » 19: // Cumulate all association rules » 20: R := R [ {I X ) X | X 2 H1 [ · · · [ Hk} » 21: end
  • 6. LITERATURE SURVEY » Frequent pattern mining has been a focused theme in data mining research for over a decade. » Abundant literature has been dedicated to this research and tremendous progress has been made till now. » It ranges from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining , correlation mining, associative classification, and frequent pattern- based clustering, as well as their broad applications.
  • 7. » Till date there had been a huge literature present for this research topic, some of the IEEE papers which we have gone through , we are naming a few of those paper’s below : 1. Efficient and scalable methods for mining frequent patterns. 2.Mining interesting frequent patterns. 3. Impact to data analysis and mining applications. 4.Applications of frequent patterns and Research Directions.
  • 8. Work Done Till Now In this part of the presentation , we will put a light on the various research works that have been done till now on the entitled project and will be naming a few of them in our presentation. 1 . A Fast Algorithm for Mining Association Rules Title of paper: A Fast Algorithm for Mining Association Rules Author : Rakesh agarwal and Ramakrishna Srikant Year of Publication: 1997 2. Mining Frequent Patterns without Candidate Generation Title of paper: Mining Frequent Patterns without Candidate Generation Author : Jiwei Han, Jian Pei, Yiwen Yin Year of Publication: 1997
  • 9. 3. Improved Association Rule Mining Algorithim for large dataset. Title of the project: Improved association rule mining for large dataset . Author: Tanu Arora , Rahul Yadav Year of Publication : 2011
  • 10. Block Diagram 1. General working of Data Mining.
  • 11. 2. Knowledge Discovery in Databases Process (KDD) 3. Distributed Systems :
  • 12. Future Work The prescribed work is implemented in a local area network, which can be extended to WAN as a future work. An improvement could be made in the efficiency of the system when number of computers are increased in the distributed system. We can also improve the efficiency of the algorithm when large Data Sets are given as input files to the tool.
  • 13. References 1. R. Agarwal, C.Faloutsos, and A.Swami, “Efficient Similarity Search in Sequence Databases, “Proc. Fourth Int’l Conf. foundations of data organization and Algorithm, Oct 1993 2. Data Mining and concepts, Morgan Kaufmann publishers,2006,2nd edition By-Han and Kamber 3. Data mining techniques, University press, 2011,2nd edition By-Arun K.Pujari 4. R.Agrawal, T.Imielinski, and A.Swami, “ Database Mining: A performance perspective “IEEE Trans. Knowledge nnd Dada Engineering, vol.5 ,pp. 914. 5. Software Engineering, Pearson Education, 2007