Mining sequential patterns matching over high utility data sets

Base Paper Title:

Record Matching over Query Results

From Multiple Web Databases

Modified Title:

Mining sequential patterns matching over high utility data sets.

Abstract:

Record matching, which identifies the records that represent the same

real-world entity, is an important step for data integration. Most state-of-

the-art record matching methods are supervised, which requires the user to

provide training data. These methods are not applicable for the Web

database scenario, where the records to match are query results

dynamically generated onthe- fly. Such records are query-dependent and a

prelearned method using training examples from previous query results

may fail on the results of a new query. To address the problem of record

Ambit lick Solutions
Mail Id : Ambitlick@gmail.com , Ambitlicksolutions@gmail.Com

matching in the Web database scenario, we present an unsupervised,online

record matching method, UDD, which, for a given query, can effectively

identify duplicates from the query result records of multiple Web

databases. After removal of the same-source duplicates, the “presumed”

non duplicate records from the same source can be used as training

examples alleviating the burden of users having to manually label training

examples. Starting from the non duplicate set, we use two cooperating

classifiers, a weighted component similarity summing classifier and an SVM

classifier, to iteratively identify duplicates in the query results from

multiple Web databases. Experimental results show that UDD works well

for the Web database scenario where existing supervised methods do not

apply.

Existing System:


• Relational database systems

• All web data base (unknown user are easy to destroy the data

base)

Proposed System:

• False data can discover the actions when unauthorized users

attempted to access computer systems or authorized users attempted

to misuse their privileges.

• Association rule mining

• An algorithm based on sequential pattern mining using the same data

collected by the Databases.


Our Proposed Work apart from Base paper:

Sequential pattern mining

a. Apriori-like methods(gsp)

b. Pattern-growth methods(Free Span, Prefix Span)

Hardware Specification

Processor Type : Pentium -III

Speed : 1.6 GHZ

Ram : 128 MB RAM


Hard disk : 8 GB HD

Software Specification

Operating System : Linux / Windows

Programming Package : JAVA

Tools : Eclipse, Weka Data Mining Tools.

Data Base : MySQL

SDK : JDK1.5.0

Algorithm:

• Association rule mining

o Find large item sets for a given minsup, and

o Compute rules for a given minconf based on the item sets

obtained before.

• Sequential pattern mining


• UDD Algorithm

• component weight assignment algorithm

Modules:

1. Analysis and design of Data sets /items:

2. Data preprocessing

3. sequential pattern mining

4. Record matching with web data base

5. Performance analysis


Mining sequential patterns matching over high utility data sets

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de ambitlick

Mais de ambitlick (20)

Último

Último (20)

Mining sequential patterns matching over high utility data sets