SlideShare uma empresa Scribd logo
1 de 2
Baixar para ler offline
A High-Performance Approach to String
Similarity using Most Frequent K Characters
”Correctness and Completeness”
Andre Valdestilhas and Axel-Cyrille Ngonga Ngomo
Universit¨at Leipzig, AKSW {lastname}@informatik.uni-leipzig.de
1 Correctness and Completeness
In this section, we prove formally that our MFKC does so on any pair of datasets.
We achieve this goal by proving that our MFKC is both correct and complete,
where:
– We say that an approach is correct if the output O it returns is such that
O ⊆ R(Ds, Dt, σ, θ).
– Approaches are said to be complete if their output O is a superset of R(Ds, Dt, σ, θ),
i.e., O ⊇ R(Ds, Dt, σ, θ).
Our MFKC consists of three nested filters, each of which creates a subset of
pairs.
A ⊆ L ⊆ N ⊆ Ds × Dt (1)
For the purpose of clearness, we name each filtering rule:
R1 ⇔
h1(s, k)k + |t|
|s| + |t|
< θ
R2 ⇔ |h(s, k) ∩ h(t, k)| = 0
R3 ⇔ σ(s, t) ≥ θ
Each subset of our MFKC can be redefined as follows:
N = { s, t /∈ Ds × Dt : R1} (2)
L = { s, t ∈ Ds × Dt : R1 ∧ R2} (3)
A = { s, t ∈ Ds × Dt : R1 ∧ R2 ∧ R3} (4)
We then introduce A∗
as the set of pairs whose similarity score is more or
equal than the threshold θ.
A∗
= { s, t ∈ Ds × Dt : σ(s, t) ≥ θ} = { s, t ∈ Ds × Dt : R3} (5)
Lemma 1. Our MFKC is correct and complete.
Proof. This lemma is equivalent to showing that A = A∗
. Let us consider all the
pairs in A. Our MFKC’s correctness follows directly from eq. (4). All we need
to show now is our MFKC’s completeness, i.e. that none of pairs discarded by
the filters actually belongs to A∗
.
Assuming that the hashes are sorted in a descending way according the fre-
quencies of the characters, therefore the first element of each hash has the highest
frequency. Therefore, once we have
h1(s, k)k + |t|
|s| + |t|
< θ,
the pair of strings s and t can be discarded without calculating the entire simi-
larity.
When the rule R3 applies, we have σ(s, t) < θ.
Which leads to R3 ⇒ R1 thus, set A can be rewritten as:
A = { s, t ∈ Ds × Dt : R2 ∧ R3} (6)
We are given two strings s and t and the respective hashes h(s, k) and h(t, k),
the intersection between the characters of these two hashes is a empty set. There-
fore, there is no character matching, which implies that s and t cannot be con-
sidered to have a similarity score greater or equal the threshold θ:
h(s, k) ∩ h(t, k) = ∅ ⇒ σ(s, t) = 0
When the rule R3 applies, we have σ(s, t) < θ.
Which leads to R3 ⇒ R2 thus, set A can be rewritten as:
A = { s, t ∈ Ds × Dt : R3} (7)
which is the definition of A∗
eq. (5). Thus, A = A∗
.

Mais conteúdo relacionado

Mais procurados

Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algosabiya sabiya
 
Chapter -3 Logarithms
Chapter -3 LogarithmsChapter -3 Logarithms
Chapter -3 LogarithmsGpmMaths
 
Rabin Karp Algorithm
Rabin Karp AlgorithmRabin Karp Algorithm
Rabin Karp AlgorithmSohail Ahmed
 
Find Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall AlgorithmFind Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall AlgorithmRajib Roy
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by FoysalFoysal Mahmud
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.Swapan Shakhari
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithmKamal Nayan
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransformTarun Gehlot
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceTransweb Global Inc
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.Malek Sumaiya
 
String searching
String searching String searching
String searching thinkphp
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatchingdbhanumahesh
 

Mais procurados (20)

Quick sort
Quick sortQuick sort
Quick sort
 
Knuth morris pratt string matching algo
Knuth morris pratt string matching algoKnuth morris pratt string matching algo
Knuth morris pratt string matching algo
 
Chapter -3 Logarithms
Chapter -3 LogarithmsChapter -3 Logarithms
Chapter -3 Logarithms
 
Rabin Karp Algorithm
Rabin Karp AlgorithmRabin Karp Algorithm
Rabin Karp Algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Find Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall AlgorithmFind Transitive Closure Using Floyd-Warshall Algorithm
Find Transitive Closure Using Floyd-Warshall Algorithm
 
String matching Algorithm by Foysal
String matching Algorithm by FoysalString matching Algorithm by Foysal
String matching Algorithm by Foysal
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
 
The gamma function
The gamma functionThe gamma function
The gamma function
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 
25 String Matching
25 String Matching25 String Matching
25 String Matching
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Inverse laplacetransform
Inverse laplacetransformInverse laplacetransform
Inverse laplacetransform
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer Science
 
Kmp
KmpKmp
Kmp
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
 
String searching
String searching String searching
String searching
 
Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 

Destaque

Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Christophe Debruyne
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
 
Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...Monika Solanki
 
Knowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesKnowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesValentina Presutti
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldMonika Solanki
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...SisInfLab-SWoT @Politecnico di Bari
 
Enabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scaleEnabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scaleMonika Solanki
 
A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...Raphael Troncy
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Chris Bizer
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13Bradley Allen
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionDeloitte United States
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018Brian Solis
 

Destaque (14)

Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...Towards maintainable constraint validation and repair for taxonomies: The Poo...
Towards maintainable constraint validation and repair for taxonomies: The Poo...
 
Knowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with FramesKnowledge Extraction and Linked Data: Playing with Frames
Knowledge Extraction and Linked Data: Playing with Frames
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT world
 
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
Linked Data (in low-resource) Platforms: a mapping for Constrained Applicatio...
 
Enabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scaleEnabling combined Software and Data engineering at Web-scale
Enabling combined Software and Data engineering at Web-scale
 
A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...A replication study of the top performing systems in SemEval twitter sentimen...
A replication study of the top performing systems in SemEval twitter sentimen...
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
 
DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13DC-2016 Keynote 2016-10-13
DC-2016 Keynote 2016-10-13
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Semelhante a Iswc 2016 completeness correctude

Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinKarlos Svoboda
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceAravind NC
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)inventionjournals
 
Regular expression
Regular expressionRegular expression
Regular expressionRajon
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerAdriano Melo
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesAlexander Decker
 
Improved helsgaun
Improved helsgaunImproved helsgaun
Improved helsgaunKaal Nath
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Alexander Decker
 
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet SeriesOn Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet SeriesIOSR Journals
 
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...BRNSS Publication Hub
 
Lecture10 Signal and Systems
Lecture10 Signal and SystemsLecture10 Signal and Systems
Lecture10 Signal and Systemsbabak danyal
 
Query optimization in database
Query optimization in databaseQuery optimization in database
Query optimization in databaseRakesh Kumar
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpProgramming Homework Help
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksAlina Leidinger
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sortijfcstjournal
 
20070823
2007082320070823
20070823neostar
 

Semelhante a Iswc 2016 completeness correctude (20)

Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
 
MC0082 –Theory of Computer Science
MC0082 –Theory of Computer ScienceMC0082 –Theory of Computer Science
MC0082 –Theory of Computer Science
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Regular expression
Regular expressionRegular expression
Regular expression
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
 
Improved helsgaun
Improved helsgaunImproved helsgaun
Improved helsgaun
 
Unequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free CodesUnequal-Cost Prefix-Free Codes
Unequal-Cost Prefix-Free Codes
 
Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.Iterative procedure for uniform continuous mapping.
Iterative procedure for uniform continuous mapping.
 
04_AJMS_254_19.pdf
04_AJMS_254_19.pdf04_AJMS_254_19.pdf
04_AJMS_254_19.pdf
 
Lec12
Lec12Lec12
Lec12
 
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet SeriesOn Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
On Spaces of Entire Functions Having Slow Growth Represented By Dirichlet Series
 
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
New Method for Finding an Optimal Solution of Generalized Fuzzy Transportatio...
 
Lecture10 Signal and Systems
Lecture10 Signal and SystemsLecture10 Signal and Systems
Lecture10 Signal and Systems
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Query optimization in database
Query optimization in databaseQuery optimization in database
Query optimization in database
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
 
Design and analysis of ra sort
Design and analysis of ra sortDesign and analysis of ra sort
Design and analysis of ra sort
 
20070823
2007082320070823
20070823
 

Mais de André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfAndré Valdestilhas
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesAndré Valdestilhas
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017André Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesAndré Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applicationsAndré Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...André Valdestilhas
 

Mais de André Valdestilhas (12)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Último

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 

Último (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Iswc 2016 completeness correctude

  • 1. A High-Performance Approach to String Similarity using Most Frequent K Characters ”Correctness and Completeness” Andre Valdestilhas and Axel-Cyrille Ngonga Ngomo Universit¨at Leipzig, AKSW {lastname}@informatik.uni-leipzig.de 1 Correctness and Completeness In this section, we prove formally that our MFKC does so on any pair of datasets. We achieve this goal by proving that our MFKC is both correct and complete, where: – We say that an approach is correct if the output O it returns is such that O ⊆ R(Ds, Dt, σ, θ). – Approaches are said to be complete if their output O is a superset of R(Ds, Dt, σ, θ), i.e., O ⊇ R(Ds, Dt, σ, θ). Our MFKC consists of three nested filters, each of which creates a subset of pairs. A ⊆ L ⊆ N ⊆ Ds × Dt (1) For the purpose of clearness, we name each filtering rule: R1 ⇔ h1(s, k)k + |t| |s| + |t| < θ R2 ⇔ |h(s, k) ∩ h(t, k)| = 0 R3 ⇔ σ(s, t) ≥ θ Each subset of our MFKC can be redefined as follows: N = { s, t /∈ Ds × Dt : R1} (2) L = { s, t ∈ Ds × Dt : R1 ∧ R2} (3) A = { s, t ∈ Ds × Dt : R1 ∧ R2 ∧ R3} (4) We then introduce A∗ as the set of pairs whose similarity score is more or equal than the threshold θ. A∗ = { s, t ∈ Ds × Dt : σ(s, t) ≥ θ} = { s, t ∈ Ds × Dt : R3} (5)
  • 2. Lemma 1. Our MFKC is correct and complete. Proof. This lemma is equivalent to showing that A = A∗ . Let us consider all the pairs in A. Our MFKC’s correctness follows directly from eq. (4). All we need to show now is our MFKC’s completeness, i.e. that none of pairs discarded by the filters actually belongs to A∗ . Assuming that the hashes are sorted in a descending way according the fre- quencies of the characters, therefore the first element of each hash has the highest frequency. Therefore, once we have h1(s, k)k + |t| |s| + |t| < θ, the pair of strings s and t can be discarded without calculating the entire simi- larity. When the rule R3 applies, we have σ(s, t) < θ. Which leads to R3 ⇒ R1 thus, set A can be rewritten as: A = { s, t ∈ Ds × Dt : R2 ∧ R3} (6) We are given two strings s and t and the respective hashes h(s, k) and h(t, k), the intersection between the characters of these two hashes is a empty set. There- fore, there is no character matching, which implies that s and t cannot be con- sidered to have a similarity score greater or equal the threshold θ: h(s, k) ∩ h(t, k) = ∅ ⇒ σ(s, t) = 0 When the rule R3 applies, we have σ(s, t) < θ. Which leads to R3 ⇒ R2 thus, set A can be rewritten as: A = { s, t ∈ Ds × Dt : R3} (7) which is the definition of A∗ eq. (5). Thus, A = A∗ .