SlideShare uma empresa Scribd logo
1 de 6
Base Paper Title:

                   Record Matching over Query Results

                      From Multiple Web Databases




Modified Title:

      Mining sequential patterns matching over high utility data sets.




Abstract:

      Record matching, which identifies the records that represent the same

real-world entity, is an important step for data integration. Most state-of-

the-art record matching methods are supervised, which requires the user to

provide training data. These methods are not applicable for the Web

database scenario, where the records to match are query results

dynamically generated onthe- fly. Such records are query-dependent and a

prelearned method using training examples from previous query results

may fail on the results of a new query. To address the problem of record

                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
matching in the Web database scenario, we present an unsupervised,online

record matching method, UDD, which, for a given query, can effectively

identify duplicates from the query result records of multiple Web

databases. After removal of the same-source duplicates, the “presumed”

non duplicate records from the same source can be used as training

examples alleviating the burden of users having to manually label training

examples. Starting from the non duplicate set, we use two cooperating

classifiers, a weighted component similarity summing classifier and an SVM

classifier, to iteratively identify duplicates in the query results from

multiple Web databases. Experimental results show that UDD works well

for the Web database scenario where existing supervised methods do not

apply.




Existing System:




                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
• Relational database systems

         • All web data base (unknown user are easy to destroy the data

            base)




Proposed System:



  •   False data can discover the actions when unauthorized users

      attempted to access computer systems or authorized users attempted

      to misuse their privileges.

  • Association rule mining

  • An algorithm based on sequential pattern mining using the same data

      collected by the Databases.




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Our Proposed Work apart from Base paper:



Sequential pattern mining

        a. Apriori-like methods(gsp)

        b. Pattern-growth methods(Free Span, Prefix Span)




Hardware Specification



Processor Type            : Pentium -III

Speed                    : 1.6 GHZ

Ram                      : 128 MB RAM


                     Ambit lick Solutions
          Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
Hard disk                  : 8 GB HD



Software Specification



Operating System                 : Linux / Windows

Programming Package : JAVA

Tools                     : Eclipse, Weka Data Mining Tools.

Data Base                 : MySQL

SDK                       : JDK1.5.0




Algorithm:

  • Association rule mining

        o Find large item sets for a given minsup, and

        o Compute rules for a given minconf based on the item sets

             obtained before.

  • Sequential pattern mining

                       Ambit lick Solutions
            Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
•   UDD Algorithm

  •   component weight assignment algorithm




Modules:



  1. Analysis and design of Data sets /items:

  2. Data preprocessing

  3. sequential pattern mining

  4. Record matching with web data base

  5. Performance analysis




                      Ambit lick Solutions
           Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

Mais conteúdo relacionado

Mais de ambitlick

Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
ambitlick
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
ambitlick
 

Mais de ambitlick (20)

IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2  IEEE -2012-13 Projects IN NS2
IEEE -2012-13 Projects IN NS2
 
Adaptive weight factor estimation from user review 1
Adaptive weight factor estimation from user   review 1Adaptive weight factor estimation from user   review 1
Adaptive weight factor estimation from user review 1
 
Integrated institutional portal
Integrated institutional portalIntegrated institutional portal
Integrated institutional portal
 
Embassy
EmbassyEmbassy
Embassy
 
Crm
Crm Crm
Crm
 
Mutual distance bounding protocols
Mutual distance bounding protocolsMutual distance bounding protocols
Mutual distance bounding protocols
 
Moderated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroupsModerated group authoring system for campus wide workgroups
Moderated group authoring system for campus wide workgroups
 
Efficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secretsEfficient spread spectrum communication without pre shared secrets
Efficient spread spectrum communication without pre shared secrets
 
Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”Comments on “mabs multicast authentication based on batch signature”
Comments on “mabs multicast authentication based on batch signature”
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy efficient protocol for deterministic
Energy efficient protocol for deterministicEnergy efficient protocol for deterministic
Energy efficient protocol for deterministic
 
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...
 
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor NetworksA Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage in Sen...
 
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node LocationsOn Multihop Distances in Wireless Sensor Networks with Random Node Locations
On Multihop Distances in Wireless Sensor Networks with Random Node Locations
 
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor NetworksAccurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
Accurate and Energy-Efficient Range-Free Localization for Mobile Sensor Networks
 
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
Maximizing the Lifetime of Wireless Sensor Networks with Mobile Sink in Delay...
 
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
Enhancing Downlink Performance in Wireless Networks by Simultaneous Multiple ...
 
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor NetworkEnergy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
Energy-Balanced Dispatch of Mobile Sensors in a Hybrid Wireless Sensor Network
 

Último

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 

Último (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Mining sequential patterns matching over high utility data sets

  • 1. Base Paper Title: Record Matching over Query Results From Multiple Web Databases Modified Title: Mining sequential patterns matching over high utility data sets. Abstract: Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of- the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated onthe- fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 2. matching in the Web database scenario, we present an unsupervised,online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the non duplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply. Existing System: Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 3. • Relational database systems • All web data base (unknown user are easy to destroy the data base) Proposed System: • False data can discover the actions when unauthorized users attempted to access computer systems or authorized users attempted to misuse their privileges. • Association rule mining • An algorithm based on sequential pattern mining using the same data collected by the Databases. Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 4. Our Proposed Work apart from Base paper: Sequential pattern mining a. Apriori-like methods(gsp) b. Pattern-growth methods(Free Span, Prefix Span) Hardware Specification Processor Type : Pentium -III Speed : 1.6 GHZ Ram : 128 MB RAM Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 5. Hard disk : 8 GB HD Software Specification Operating System : Linux / Windows Programming Package : JAVA Tools : Eclipse, Weka Data Mining Tools. Data Base : MySQL SDK : JDK1.5.0 Algorithm: • Association rule mining o Find large item sets for a given minsup, and o Compute rules for a given minconf based on the item sets obtained before. • Sequential pattern mining Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com
  • 6. UDD Algorithm • component weight assignment algorithm Modules: 1. Analysis and design of Data sets /items: 2. Data preprocessing 3. sequential pattern mining 4. Record matching with web data base 5. Performance analysis Ambit lick Solutions Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com