SlideShare a Scribd company logo
1 of 30
Hiding  sensitive  items  in                      association  rule  mining. …….Exploration of knowledge and privacy preserving. Presented By: M.Swarna Rekha. K.Reshma. Ch.Savanth. Vishnu Babu. Nag Santhosh. Moses.
Talk overview Growing privacy concerns. Why privacy preserving data mining? Approaches. Problem statement. Apriori algorithm. Problem description. Proposed algorithms. Illustrating Examples. Analysis. Conclusions. Software and Hardware requirement Specification
Growingprivacyconcerns Threat to individual privacy. Inference of sensitive information including personal information or even patterns from non-sensitive information. ,[object Object]
Not necessarily a person
Information about a corporation
Transaction record,[object Object]
Why privacy preserving data mining? Multinational Corporations A company would like to mine its data for globally valid results But national laws may prevent  transborder  data sharing Public use of private data Data mining enables research studies of large populations But these populations are reluctant to release personal information
Example:Patient Records… Patient health records split among providers Insurance company Pharmacy Doctor Hospital Each agrees not to release the data without my consent Medical study wants correlations across providers Rules relating complaints/procedures to “unrelated” drugs Does this need patient consent? And that of every other patient! It shouldn’t! Rules shouldn’t disclose  patient individual data
Approaches:  The first approach is to alter the data before delivery to the data miner so that real values are obscured.  The second approach assumes the data is distributed between two or more sites, and these sites cooperate to leam the global data mining results without  revealing the data at their individual sites.
                Introduction Our  technique  of  altering the data is to selectively modify individual values from a database to prevent discovery of set of rules. Here we apply a group of heuristic solutions for reducing the number of  occurrences of some frequent itemsets below a minimum user specified threshold. The second approach is to allow users  access to only a subset of data while global data mining results can still be discovered.
          Problem statement Mining of association rules.                      Let I = { i,, i2,…., im } be a set of literals, called items. Given a set of transactions D, where each transaction T is a set of items such that T is subset or equal to I , an association rule is an expression X=>Y where X,Y are subset or equal to I and XП Y = ø .An example of such a rule is that 90% of customers buy hamburgers also buy coke. The 90% here is called the confidence of the rule which means that 90% of transaction that contain X also contain Y. The support of the rule is the percentage of transactions that contain both X and Y. The problem of mining association  rules is to find all rules that are greater than the user-specified minimum support and minimum confidence.
DataMiningCombiner Combinedresults LocalDataMining LocalDataMining LocalDataMining Local  Data Local  Data Local  Data       Mining of association rules A&B  C A & B  C A&B  C 4%
Apriori algorithm: Apriori is an influential algorithm for mining frequent itemsets from a givan database.It employs an iterative approach known as a level-wise search,where k-itemsets are used to explore (k+1)-itmsets. Apriori  property:         All non-empty subsets of a frequent itemset must also be frequent.                                                      A two step process: ,[object Object]
 To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself.This set of candidates is denoted Ck .The join is performed where members of Lk-1 are joinable if their first (k-2) items are in common.
2.The prune step:
A scan of database to determine the count of each candidate in ck would result in the determination of  Lk .Any (k-1) subset  that  is not  frequent cannot be a subset of a frequent k-itemset. Hence if any (k-1) subset of a candidate k-itemset is not in Lk-1 ,then the candidate cannot be frequent either and so can be removed from Ck .,[object Object]
Procedure Apriori_gen(Lk-1:frequent (k-1)-itemsets; min sup: minimum support) 1.		for each itemset l1є Lk-1 2. 		     for each itemset l2є Lk-1 3.		         if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ^ …. ^ (l1[k - 2] = l2[k - 2]) ^ (l1[k       			- 1] < l2[k-1]) then { 4.		 c = l1join l2; // join step: generate candidates 5. 			if has infrequent subset(c;Lk-1) then 6. 			         delete c; // prune step: remove unfruitful candidate 7.		 	else add c to Ck; 8.		          } 9.		return Ck; Procedure has infrequent subset(c: candidate k-itemset; Lk-1: frequent (k-1)-itemsets); // use prior knowledge 1.		for each (k - 1)-subset s of c 2. 		       if s !є Lk-1 then 3. 		 	return TRUE; 4. 		return FALSE;
Example: Transaction Database D C1: L1: Compare candidate  support count with  minimumSupport            count 2 Scan D for Count of each Candidate
C2: L2: Generate C2 Candidates     from L1 Scan D for Count of each     candidate C3: Generate C3 Candidates     from L2 Scan D for  Count of each    candidate  L3: Generation of candidate itemsets and frequent itemsets,                  where the minimum support count is 2.
Generating Association Rules: Consider  the frequent itemset L2={AB,AC,BC}.The non-empty subsets of L2 are {A},{B},{A},{C},{B},{C}.The resulting association rules are: ,[object Object]
A=>C              Confidence=4/6=66%
C=>A              Confidence=4/4=100%
B=>C              Confidence=3/4=75%
C=>B              Confidence= 3/4=75%If the minimum confidence thresold is 70%,except  second rule all are strong. ,[object Object]
A=>B^C        Confidence=3/6=50%
B=>A^C        Confidence=3/4=75%
C=>A^B        Confidence=3/4=75%
A^B=>C        Confidence=3/4=75%
A^C=>B        Confidence=3/4=75%
B^C=>A        Confidence=3/3=100%

More Related Content

What's hot

deadlock avoidance
deadlock avoidancedeadlock avoidance
deadlock avoidance
wahab13
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
DeepaR42
 
Associations1
Associations1Associations1
Associations1
mancnilu
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
Ninad Samel
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
ijdpsjournal
 

What's hot (20)

Rmining
RminingRmining
Rmining
 
deadlock avoidance
deadlock avoidancedeadlock avoidance
deadlock avoidance
 
OS_Ch8
OS_Ch8OS_Ch8
OS_Ch8
 
Deadlock Avoidance - OS
Deadlock Avoidance - OSDeadlock Avoidance - OS
Deadlock Avoidance - OS
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Associations1
Associations1Associations1
Associations1
 
Ijcet 06 06_003
Ijcet 06 06_003Ijcet 06 06_003
Ijcet 06 06_003
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Review
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
5 parallel implementation 06299286
5 parallel implementation 062992865 parallel implementation 06299286
5 parallel implementation 06299286
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
A Survey on Frequent Patterns To Optimize Association Rules
A Survey on Frequent Patterns To Optimize Association RulesA Survey on Frequent Patterns To Optimize Association Rules
A Survey on Frequent Patterns To Optimize Association Rules
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Apriori
AprioriApriori
Apriori
 
Ijariie1129
Ijariie1129Ijariie1129
Ijariie1129
 
Scalable frequent itemset mining using heterogeneous computing par apriori a...
Scalable frequent itemset mining using heterogeneous computing  par apriori a...Scalable frequent itemset mining using heterogeneous computing  par apriori a...
Scalable frequent itemset mining using heterogeneous computing par apriori a...
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern mining
 
Ijcatr04051008
Ijcatr04051008Ijcatr04051008
Ijcatr04051008
 

Similar to Hiding slides

Hiding Sensitive Association Rules
Hiding Sensitive Association Rules Hiding Sensitive Association Rules
Hiding Sensitive Association Rules
Vinayreddy Polati
 

Similar to Hiding slides (20)

Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
 
Cs583 association-sequential-patterns
Cs583 association-sequential-patternsCs583 association-sequential-patterns
Cs583 association-sequential-patterns
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted DataResult Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
 
Pattern Discovery Using Apriori and Ch-Search Algorithm
 Pattern Discovery Using Apriori and Ch-Search Algorithm Pattern Discovery Using Apriori and Ch-Search Algorithm
Pattern Discovery Using Apriori and Ch-Search Algorithm
 
Associative Learning
Associative LearningAssociative Learning
Associative Learning
 
Output Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic AlgorithmOutput Privacy Protection With Pattern-Based Heuristic Algorithm
Output Privacy Protection With Pattern-Based Heuristic Algorithm
 
Hiding Sensitive Association Rules
Hiding Sensitive Association Rules Hiding Sensitive Association Rules
Hiding Sensitive Association Rules
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
unit II Mining Association Rule.pdf
unit II Mining   Association    Rule.pdfunit II Mining   Association    Rule.pdf
unit II Mining Association Rule.pdf
 
Association rule mining used in data mining
Association rule mining used in data miningAssociation rule mining used in data mining
Association rule mining used in data mining
 
CS583-association-rules.ppt
CS583-association-rules.pptCS583-association-rules.ppt
CS583-association-rules.ppt
 
An Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori AlgorithmAn Approach of Improvisation in Efficiency of Apriori Algorithm
An Approach of Improvisation in Efficiency of Apriori Algorithm
 
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of  Apriori and Apriori with Hashing AlgorithmIRJET-Comparative Analysis of  Apriori and Apriori with Hashing Algorithm
IRJET-Comparative Analysis of Apriori and Apriori with Hashing Algorithm
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
 
An Enhanced Approach of Sensitive Information Hiding
An Enhanced Approach of Sensitive Information HidingAn Enhanced Approach of Sensitive Information Hiding
An Enhanced Approach of Sensitive Information Hiding
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Hiding slides

  • 1. Hiding sensitive items in association rule mining. …….Exploration of knowledge and privacy preserving. Presented By: M.Swarna Rekha. K.Reshma. Ch.Savanth. Vishnu Babu. Nag Santhosh. Moses.
  • 2. Talk overview Growing privacy concerns. Why privacy preserving data mining? Approaches. Problem statement. Apriori algorithm. Problem description. Proposed algorithms. Illustrating Examples. Analysis. Conclusions. Software and Hardware requirement Specification
  • 3.
  • 5. Information about a corporation
  • 6.
  • 7. Why privacy preserving data mining? Multinational Corporations A company would like to mine its data for globally valid results But national laws may prevent transborder data sharing Public use of private data Data mining enables research studies of large populations But these populations are reluctant to release personal information
  • 8. Example:Patient Records… Patient health records split among providers Insurance company Pharmacy Doctor Hospital Each agrees not to release the data without my consent Medical study wants correlations across providers Rules relating complaints/procedures to “unrelated” drugs Does this need patient consent? And that of every other patient! It shouldn’t! Rules shouldn’t disclose patient individual data
  • 9. Approaches: The first approach is to alter the data before delivery to the data miner so that real values are obscured. The second approach assumes the data is distributed between two or more sites, and these sites cooperate to leam the global data mining results without revealing the data at their individual sites.
  • 10. Introduction Our technique of altering the data is to selectively modify individual values from a database to prevent discovery of set of rules. Here we apply a group of heuristic solutions for reducing the number of occurrences of some frequent itemsets below a minimum user specified threshold. The second approach is to allow users access to only a subset of data while global data mining results can still be discovered.
  • 11. Problem statement Mining of association rules. Let I = { i,, i2,…., im } be a set of literals, called items. Given a set of transactions D, where each transaction T is a set of items such that T is subset or equal to I , an association rule is an expression X=>Y where X,Y are subset or equal to I and XП Y = ø .An example of such a rule is that 90% of customers buy hamburgers also buy coke. The 90% here is called the confidence of the rule which means that 90% of transaction that contain X also contain Y. The support of the rule is the percentage of transactions that contain both X and Y. The problem of mining association rules is to find all rules that are greater than the user-specified minimum support and minimum confidence.
  • 12. DataMiningCombiner Combinedresults LocalDataMining LocalDataMining LocalDataMining Local Data Local Data Local Data Mining of association rules A&B  C A & B  C A&B  C 4%
  • 13.
  • 14. To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself.This set of candidates is denoted Ck .The join is performed where members of Lk-1 are joinable if their first (k-2) items are in common.
  • 16.
  • 17. Procedure Apriori_gen(Lk-1:frequent (k-1)-itemsets; min sup: minimum support) 1. for each itemset l1є Lk-1 2. for each itemset l2є Lk-1 3. if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ^ …. ^ (l1[k - 2] = l2[k - 2]) ^ (l1[k - 1] < l2[k-1]) then { 4. c = l1join l2; // join step: generate candidates 5. if has infrequent subset(c;Lk-1) then 6. delete c; // prune step: remove unfruitful candidate 7. else add c to Ck; 8. } 9. return Ck; Procedure has infrequent subset(c: candidate k-itemset; Lk-1: frequent (k-1)-itemsets); // use prior knowledge 1. for each (k - 1)-subset s of c 2. if s !є Lk-1 then 3. return TRUE; 4. return FALSE;
  • 18. Example: Transaction Database D C1: L1: Compare candidate support count with minimumSupport count 2 Scan D for Count of each Candidate
  • 19. C2: L2: Generate C2 Candidates from L1 Scan D for Count of each candidate C3: Generate C3 Candidates from L2 Scan D for Count of each candidate L3: Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2.
  • 20.
  • 21. A=>C Confidence=4/6=66%
  • 22. C=>A Confidence=4/4=100%
  • 23. B=>C Confidence=3/4=75%
  • 24.
  • 25. A=>B^C Confidence=3/6=50%
  • 26. B=>A^C Confidence=3/4=75%
  • 27. C=>A^B Confidence=3/4=75%
  • 28. A^B=>C Confidence=3/4=75%
  • 29. A^C=>B Confidence=3/4=75%
  • 30. B^C=>A Confidence=3/3=100%
  • 31.
  • 32. Proposed algorithms…… To hide an association rule,we can either decrease its support or its confidence to be smaller than pre-specified minimum support and minimum confidence.To decrease confidence of a rule, we propose two algorithms: Increase Support of LHS First(ISLF). Decrease Support of RHS First(DSRF). The first algorithm tries to increase the support of left hand side of rule.If it was not successful,it tries to decrease the support of the right hand side of the rule.
  • 33. Algorithm ISLF: Input: (1) A source database D, (2) A min-support, (3) A min-confidence, (4) A set of hidden items H Output: A transformed database D’, where rules containing H on RHS will be hidden Algorithm: 1. Find large I-item sets from D ; 2. For each hidden item h є H 3. If h is not a large I-item set, then H := H-{h} ; 4. If H is empty, then EXIT;// no AR contains H in RHS 5. Find large 2-itemsets from D ; 6. For each hє H { 7. For each large 2-itemset containing h {
  • 34. 8. Compute confidence of rule U, where U is a rule of x-> h ; 9. If Confidence >min _ conf , then {//Increase Support of LHS 10. Find T1={t in D | t partially supports LHS(U); 11. Sort T1 in descending order by the number of supported items 12. Repeat { 13. choose the first transaction t from T1; 14. Modify t to support LHS(U); 15. Compute support and confidence of U };} 16. Until ( confidence (U) < min _ conf or T1 is empty ); 17. } ; //end if confidence>min-conf 18. If confidence > min-conf, then {/Decrease Support of RHS 19. Find T2 = { t in D I t supports RHS (U)} ; 20. Sort T2 in descending order by the number of supported items ; 21. Repeat {
  • 35. 22. Choose the first transaction t from Tz; 23. Modify t to partially support RHS(U) ; 24. Compute support and confidence of U; } 25. Until ( confidence(U) <min-conf or T2 is empty ) ; 26. } ; //end if confidence>min-conf 27. If Confidence > min-conf, then 28. CAN NOT HIDE h ; 29. Else 30. Update D with new transaction t; 31. }//end of for each large 2-itemset 32. Remove h from H; 33. }//end of for each h є H Output updated D,as the transformed D’;
  • 36. Example Running ISLF Algorithm Example 1: To hide an item C,the rule B C (50%,75%) will be hidden if transaction T5 is modified from 100 to 110 using ISL .To hide item B,the rule A B(67%,83%) will be hidden if transaction T1 is modified from 111 to 101 using DSR. Database before and after hiding item C,B using ISLF
  • 37. Example 2: Here we reverse the order of hiding items.To hide the item B,the rule C B(50%,75%) will be hidden if transaction T5 is modified from 100 to 101 using ISL.To hide item C,the rule A C(83%,83%) will be hidden if transaction T1 is modified from 111 to 110 using DSR. Database before and after hiding item B,C using ISLF
  • 38. Examples running DSRF Algorithm Example 3: To hide an item C,the rule B C(50%,75%) will be hidden if transaction T1 is modified from 111 to 110 using DSR.To hide item B,the rule C B(50%,67%) will be hidden due to transaction T1 is modified. Database before and after hiding item C,B using DSRF
  • 39. Example 4: Here we reverse the order of hiding items.To hide item B,the rule C B(50%,75%) will be hidden if transaction T1 is modified from 111 to 101 using DSR.To hide item C,the rule B C will be hidden due to transaction T1 is modified. Database before and after hiding item B,C using DSRF
  • 40. Analysis: The first characteristic is that the transformed databases are different under different ordering of hiding items. From the above illustrated examples database D2,D4 are generated using ISLF and D5,D6 are generated using DSRF algorithm. The second characteristic we analyze is the efficiency of the proposed algorithm compared with Dasseni’s algorithm.It can be seen that ISLF and DSRF algorithms require less database scanning and prune more number of association rules compared with Dasseni’s algorithm. DB Scans and Rules pruned in hiding item C using ISLF
  • 41. One of the reasons that dasseni’s approach does not prune rules is that hidden rules are given in advance. Our approach needs to hide all rules containing hidden items on the right hands side,but dasseni’s approach can hide some of the rules containing hidden item on the right hand side. The third characteristic we analyze is efficiency comparison of the ISLF and DSRF algorithmsDSRF algorithm seems to be more effective when the support count of the hidden item is large. This is due to when support of right hand side of the rule is large; increase support of left hand side usually does not reduce the confidence of the rule but decrease support of right hand side usually decreases the confidence of the rule.
  • 42. Conclusions: we have examined the database privacy problems caused by data mining technology and proposed two algorithms for hiding sensitive data in association rules mining. The proposed algorithms are based on modifying the database transactions so that the confidence of the association rules can be reduced. Examples demonstrating the proposed algorithms are shown. The efficiency of the proposed approach are further compared with Dasseni’s approach.It was shown that our approach required less number of database scanning and prune more number of hidden rules. However, our approach must hide all rules containing the hidden items on the right hand side, where Dasseni’s approach can hide some of the specified rules.
  • 43. Software requirement specification: The proposed algorithms can be implemented using JAVA as Front-end and Oracle-9i as Back-end under Windows environment. Intel core 2 duo processor RAM size RAM speed Hardware requirement specification: