SlideShare uma empresa Scribd logo
1 de 1
Impulse Technologies
                                      Beacons U to World of technology
        044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in
      Efficient and Effective Duplicate Detection in Hierarchical Data
   Abstract
          Although there is a long line of work on identifying duplicates in relational
   data, only a few solutions focus on duplicate detection in more complex
   hierarchical structures, like XML data. In this paper, we present a novel method for
   XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to
   determine the probability of two XML elements being duplicates, considering not
   only the information within the elements, but also the way that information is
   structured. In addition, to improve the efficiency of the network evaluation, a novel
   pruning strategy, capable of significant gains over the unoptimized version of the
   algorithm, is presented. Through experiments, we show that our algorithm is able
   to achieve high precision and recall scores in several datasets. XMLDup is also
   able to outperform another state of the art duplicate detection solution, both in
   terms of efficiency and of effectiveness. Finally, we also study how important the
   structure of elements is in the duplicate detection process. We observe that, not
   only structure can clearly influence the outcome, but also that, by ensuring a
   structure that is adequate to the characteristics of the data, we can actually improve
   the quality of the results.




  Your Own Ideas or Any project from any company can be Implemented
at Better price (All Projects can be done in Java or DotNet whichever the student wants)
                                                                                          1

Mais conteúdo relacionado

Mais procurados

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkagePapitha Velumani
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis AbstractBruno Dzogovic
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKNexgen Technology
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEEFINALYEARSTUDENTPROJECTS
 

Mais procurados (10)

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkage
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis Abstract
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
 

Destaque

Destaque (6)

Catching packet droppers and modifiers
Catching packet droppers and modifiersCatching packet droppers and modifiers
Catching packet droppers and modifiers
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a 24

RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarityitrejos
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecway2004
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...ecwayprojects
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 

Semelhante a 24 (20)

K04302082087
K04302082087K04302082087
K04302082087
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarity
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
2
22
2
 
2
22
2
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 

Mais de IMPULSE_TECHNOLOGY (20)

17
1717
17
 
16
1616
16
 
15
1515
15
 
25
2525
25
 
23
2323
23
 
22
2222
22
 
21
2121
21
 
20
2020
20
 
19
1919
19
 
18
1818
18
 
16
1616
16
 
15
1515
15
 
14
1414
14
 
13
1313
13
 
12
1212
12
 
11
1111
11
 
10
1010
10
 
9
99
9
 
8
88
8
 
7
77
7
 

Último

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 

24

  • 1. Impulse Technologies Beacons U to World of technology 044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in Efficient and Effective Duplicate Detection in Hierarchical Data Abstract Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this paper, we present a novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup is also able to outperform another state of the art duplicate detection solution, both in terms of efficiency and of effectiveness. Finally, we also study how important the structure of elements is in the duplicate detection process. We observe that, not only structure can clearly influence the outcome, but also that, by ensuring a structure that is adequate to the characteristics of the data, we can actually improve the quality of the results. Your Own Ideas or Any project from any company can be Implemented at Better price (All Projects can be done in Java or DotNet whichever the student wants) 1