SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
.
......
Approximate text matching with the stringdist
package
Mark van der Loo
Statistics Netherlands
useR!2014
markvanderloo.eu @MarkPJvanderLoo
........ ..... ................. ................. ................. .... .... . .... ........ .
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. The stringdist package
.
Fuzzy dictionary lookup
..
......
amatch Fuzzy matching equivalent of match
ain Fuzzy matching equivalent of %in%
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. The stringdist package
.
Fuzzy dictionary lookup
..
......
amatch Fuzzy matching equivalent of match
ain Fuzzy matching equivalent of %in%
.
String metrics
..
......
stringdist Pairwise distances
stringdistmatrix Distance matrix
qgrams Compute q-gram profile
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. The stringdist package
.
Fuzzy dictionary lookup
..
......
amatch Fuzzy matching equivalent of match
ain Fuzzy matching equivalent of %in%
.
String metrics
..
......
stringdist Pairwise distances
stringdistmatrix Distance matrix
qgrams Compute q-gram profile
.
Design“philosophy”
..
......
Create interfaces that resemble base R
(e.g. match, adist, nchar, agrep)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Dictionary lookup
> match("leia", c("leela","leia"))
[1] 2
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Dictionary lookup
> match("leia", c("leela","leia"))
[1] 2
> match("liea", c("leela","leia"))
[1] NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Dictionary lookup
> match("leia", c("leela","leia"))
[1] 2
> match("liea", c("leela","leia"))
[1] NA
> amatch("liea", c("leela","leia"), maxDist=1)
[1] 2
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Dictionary lookup
> match("leia", c("leela","leia"))
[1] 2
> match("liea", c("leela","leia"))
[1] NA
> amatch("liea", c("leela","leia"), maxDist=1)
[1] 2
> "liea" %in% c("leela","leia")
[1] FALSE
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Dictionary lookup
> match("leia", c("leela","leia"))
[1] 2
> match("liea", c("leela","leia"))
[1] NA
> amatch("liea", c("leela","leia"), maxDist=1)
[1] 2
> "liea" %in% c("leela","leia")
[1] FALSE
> ain("liea", c("leela","leia"), maxDist=1)
[1] TRUE
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. String distance
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. String distances
.
Implemented in the package
..
......
edit-based distances
q-gram based distances
heuristic distances
.
Review papers
..
......
L. Boytsov (2011). ACM Journal of Experimental
Algorithmics 16 1–86.
G. Navarro (2001). ACM Computing Surveys 33 31–88.
.
stringdist paper
..
......
M.P.J. van der Loo (2014). The stringdist package for
approximate string matching. The R Journal 6 xx-xx.
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Edit-based distances
.
Definition
..
......
Count the minimum number of (weighted) basic operations that
turns string s into string t.
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Edit-based distances
.
Definition
..
......
Count the minimum number of (weighted) basic operations that
turns string s into string t.
Allowed operation
Distance substitution deletion insertion transposition
Hamming    
LCS    
Levenshtein    
OSA    ∗
Damerau-
Levenshtein
   
∗Substrings may be edited only once.
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Edit-based distances
.
Definition
..
......
Count the minimum number of (weighted) basic operations that
turns string s into string t.
Allowed operation
Distance substitution deletion insertion transposition
Hamming    
LCS    
Levenshtein    
OSA    ∗
Damerau-
Levenshtein
   
∗Substrings may be edited only once.
 stringdist(leia,liea,method=hamming)
[1] 2
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Edit-based distances
.
Definition
..
......
Count the minimum number of (weighted) basic operations that
turns string s into string t.
Allowed operation
Distance substitution deletion insertion transposition
Hamming    
LCS    
Levenshtein    
OSA    ∗
Damerau-
Levenshtein
   
∗Substrings may be edited only once.
 stringdist(leia,liea,method=hamming)
[1] 2
 stringdist(leia,liea,method=dl)
[1] 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
banana
Q:
x :
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
ba nana
Q: ba
x : 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
b an ana
Q: ba an
x : 1 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
ba na na
Q: ba an na
x : 1 1 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
ban an a
Q: ba an na
x : 1 2 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
bana na
Q: ba an na
x : 1 2 2
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
Jaccard
|Q1 ∩ Q2|
|Q1 ∪ Q2|
 stringdist(leia,leela
+ , method=jaccard,q=2)
[1] 0.8333333
Cosine cos (x∠y)
 stringdist(leia,leela
+ , method=cosine,q=2)
[1] 0.7113249
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. q-gram based distances
.
Definition
..
......Any (vector) distance between two q-gram profiles.
 qgrams(x = leia,y = leela,q=2)
le ei ia la el ee
x 1 1 1 0 0 0
y 1 0 0 1 1 1
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Heuristic distances: Jaro-Winkler
.
Definition
..
......
It’s complicated :-).
Intended for human-typed name/address data
 stringdist(liea,leia,method=jw,p=0.1)
[1] 0.075
Ranges from 0 (equal) to 1 (dissimilar).
0 ≤ p ≤ 0.25: emphasis on first 4 characters.
p = 0: Jaro-distance
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Character encoding
Image from https://code.google.com/p/tworsekey/
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Character encoding
 stringdist(’¨o’,’o’)
[1] 1 # Replace one symbol
 stringdist(’¨o’,’o’,useBytes=TRUE)
[1] 2 # delete one byte, replace another (utf-8)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Missing values
Name
: Value Six
Missing Since : 30-06-2014
Last seen at : input.csv
If you have seen Value Six or know
someone who has, please contact your
local statistician at +31 415 926 53
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
[1] 1 # - note the useR’s OMGWTFBBQ right there
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
[1] 1 # - note the useR’s OMGWTFBBQ right there
 amatch(NA, NA)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
[1] 1 # - note the useR’s OMGWTFBBQ right there
 amatch(NA, NA)
[1] 1 # - ok, at least we’re consistent
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
[1] 1 # - note the useR’s OMGWTFBBQ right there
 amatch(NA, NA)
[1] 1 # - ok, at least we’re consistent
 amatch(NA, NA, matchNA=FALSE)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Handling missing values
 NA == NA
[1] NA
 adist(NA, NA)
[1] NA
 stringdist(NA, NA)
[1] NA
 match(NA, NA)
[1] 1 # - note the useR’s OMGWTFBBQ right there
 amatch(NA, NA)
[1] 1 # - ok, at least we’re consistent
 amatch(NA, NA, matchNA=FALSE)
[1] NA
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Parallelization
For a single call:
 stringdistmatrix(a,b,ncores=4)
Or, define your own cluster:
 cl - makeCluster(4)
 stringdistmatrix(a, b, cluster=cl)
 stringdistmatrix(c, d, cluster=cl)
 stopCluster(cl)
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Performance
stringdist(method=’lv’)
About 30% faster than adist
About 2 times faster then RecordLinkage
When comparing strings of 5 - 25 characters
Mark van der Loo Approximate text matching with the stringdist package
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.. Summary
Nine different string metrics; core in C99 (sorry Dirk :-) )
Approximate dictionary lookup
Proper handling of encoding and missing values
Fast
Paralellization built in
Thank you for your attention!
t : @markpjvanderloo
e : mark.vanderloo@gmail.com
w : markvanderloo.eu
Mark van der Loo Approximate text matching with the stringdist package

Mais conteúdo relacionado

Mais procurados

An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys MeetupAlexandros Karatzoglou
 
Prediction of Diamond Prices Using Multivariate Regression
Prediction of Diamond Prices Using Multivariate RegressionPrediction of Diamond Prices Using Multivariate Regression
Prediction of Diamond Prices Using Multivariate RegressionMohitMhapuskar
 
Artificial Intelligence Question Bank
Artificial Intelligence Question BankArtificial Intelligence Question Bank
Artificial Intelligence Question BankSpardhavijetha2DrKMs
 
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus Helicopters
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus HelicoptersCIMPA : Enhancing Data Exposition & Digital Twin for Airbus Helicopters
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus HelicoptersNeo4j
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Jumpstart! From SQL to NoSQL -- Changing Your Mindset
Jumpstart! From SQL to NoSQL -- Changing Your MindsetJumpstart! From SQL to NoSQL -- Changing Your Mindset
Jumpstart! From SQL to NoSQL -- Changing Your MindsetLauren Hayward Schaefer
 
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemEhsan38
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Simplilearn
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super readsAbdullah Khan Zehady
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 

Mais procurados (20)

An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model Database
 
Clustering
ClusteringClustering
Clustering
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
 
Prediction of Diamond Prices Using Multivariate Regression
Prediction of Diamond Prices Using Multivariate RegressionPrediction of Diamond Prices Using Multivariate Regression
Prediction of Diamond Prices Using Multivariate Regression
 
SLIQ
SLIQSLIQ
SLIQ
 
Artificial Intelligence Question Bank
Artificial Intelligence Question BankArtificial Intelligence Question Bank
Artificial Intelligence Question Bank
 
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus Helicopters
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus HelicoptersCIMPA : Enhancing Data Exposition & Digital Twin for Airbus Helicopters
CIMPA : Enhancing Data Exposition & Digital Twin for Airbus Helicopters
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Jumpstart! From SQL to NoSQL -- Changing Your Mindset
Jumpstart! From SQL to NoSQL -- Changing Your MindsetJumpstart! From SQL to NoSQL -- Changing Your Mindset
Jumpstart! From SQL to NoSQL -- Changing Your Mindset
 
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
 
Knn
KnnKnn
Knn
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 

Destaque

Arts - Text matching to measure patent similarity
Arts - Text matching to measure patent similarityArts - Text matching to measure patent similarity
Arts - Text matching to measure patent similarityinnovationoecd
 
Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)otisg
 
ทักษะในการเล่นวอลเล่ย์บอล
ทักษะในการเล่นวอลเล่ย์บอลทักษะในการเล่นวอลเล่ย์บอล
ทักษะในการเล่นวอลเล่ย์บอลTepasoon Songnaa
 
มารยาทในการลีลาศ
มารยาทในการลีลาศมารยาทในการลีลาศ
มารยาทในการลีลาศTepasoon Songnaa
 
ใบงาน วิชาเพศศึกษา
ใบงาน วิชาเพศศึกษาใบงาน วิชาเพศศึกษา
ใบงาน วิชาเพศศึกษาTepasoon Songnaa
 
Luyện thi Chứng chỉ A Tin học
Luyện thi Chứng chỉ A Tin họcLuyện thi Chứng chỉ A Tin học
Luyện thi Chứng chỉ A Tin họcltgiang87
 
Data validation infrastructure: the validate package
Data validation infrastructure: the validate packageData validation infrastructure: the validate package
Data validation infrastructure: the validate packageMark Van Der Loo
 
ประวัติวอลเลย์บอล
ประวัติวอลเลย์บอลประวัติวอลเลย์บอล
ประวัติวอลเลย์บอลTepasoon Songnaa
 
การลงทุนในหุ้นสามัญ
การลงทุนในหุ้นสามัญการลงทุนในหุ้นสามัญ
การลงทุนในหุ้นสามัญKittiya Youngjarean
 
Policy Engagement through Digital Participation (Serious Games)
Policy Engagement through Digital Participation (Serious Games)Policy Engagement through Digital Participation (Serious Games)
Policy Engagement through Digital Participation (Serious Games)Abertay University
 
Find the words_that_ryhme
Find the words_that_ryhmeFind the words_that_ryhme
Find the words_that_ryhmelavenderblue68
 
มารยาทในการลีลาศ
มารยาทในการลีลาศมารยาทในการลีลาศ
มารยาทในการลีลาศTepasoon Songnaa
 
Bridging the gap between theory and practise
Bridging the gap between theory and practiseBridging the gap between theory and practise
Bridging the gap between theory and practiseAnshul Punetha
 
การรวมกิจการ (ต่อ)
การรวมกิจการ (ต่อ)การรวมกิจการ (ต่อ)
การรวมกิจการ (ต่อ)Kittiya Youngjarean
 
Speed up your digital transformation
Speed up your digital transformationSpeed up your digital transformation
Speed up your digital transformationFranck de Dieuleveult
 
Meaning and Importance of Genetic Engineering by Aira J. Siniel
Meaning and Importance of Genetic Engineering by Aira J. SinielMeaning and Importance of Genetic Engineering by Aira J. Siniel
Meaning and Importance of Genetic Engineering by Aira J. Sinielairazygy
 

Destaque (20)

Arts - Text matching to measure patent similarity
Arts - Text matching to measure patent similarityArts - Text matching to measure patent similarity
Arts - Text matching to measure patent similarity
 
Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)
 
ทักษะในการเล่นวอลเล่ย์บอล
ทักษะในการเล่นวอลเล่ย์บอลทักษะในการเล่นวอลเล่ย์บอล
ทักษะในการเล่นวอลเล่ย์บอล
 
มารยาทในการลีลาศ
มารยาทในการลีลาศมารยาทในการลีลาศ
มารยาทในการลีลาศ
 
ใบงาน วิชาเพศศึกษา
ใบงาน วิชาเพศศึกษาใบงาน วิชาเพศศึกษา
ใบงาน วิชาเพศศึกษา
 
Ejercicio 7
Ejercicio 7Ejercicio 7
Ejercicio 7
 
Luyện thi Chứng chỉ A Tin học
Luyện thi Chứng chỉ A Tin họcLuyện thi Chứng chỉ A Tin học
Luyện thi Chứng chỉ A Tin học
 
Data validation infrastructure: the validate package
Data validation infrastructure: the validate packageData validation infrastructure: the validate package
Data validation infrastructure: the validate package
 
Tugas 1
Tugas 1Tugas 1
Tugas 1
 
ประวัติวอลเลย์บอล
ประวัติวอลเลย์บอลประวัติวอลเลย์บอล
ประวัติวอลเลย์บอล
 
การลงทุนในหุ้นสามัญ
การลงทุนในหุ้นสามัญการลงทุนในหุ้นสามัญ
การลงทุนในหุ้นสามัญ
 
Policy Engagement through Digital Participation (Serious Games)
Policy Engagement through Digital Participation (Serious Games)Policy Engagement through Digital Participation (Serious Games)
Policy Engagement through Digital Participation (Serious Games)
 
Storyworld Jam '14
Storyworld Jam '14Storyworld Jam '14
Storyworld Jam '14
 
Find the words_that_ryhme
Find the words_that_ryhmeFind the words_that_ryhme
Find the words_that_ryhme
 
Final start
Final startFinal start
Final start
 
มารยาทในการลีลาศ
มารยาทในการลีลาศมารยาทในการลีลาศ
มารยาทในการลีลาศ
 
Bridging the gap between theory and practise
Bridging the gap between theory and practiseBridging the gap between theory and practise
Bridging the gap between theory and practise
 
การรวมกิจการ (ต่อ)
การรวมกิจการ (ต่อ)การรวมกิจการ (ต่อ)
การรวมกิจการ (ต่อ)
 
Speed up your digital transformation
Speed up your digital transformationSpeed up your digital transformation
Speed up your digital transformation
 
Meaning and Importance of Genetic Engineering by Aira J. Siniel
Meaning and Importance of Genetic Engineering by Aira J. SinielMeaning and Importance of Genetic Engineering by Aira J. Siniel
Meaning and Importance of Genetic Engineering by Aira J. Siniel
 

Semelhante a String distance metrics and fuzzy matching in R

Semelhante a String distance metrics and fuzzy matching in R (20)

Apostila
ApostilaApostila
Apostila
 
Apostila latex
Apostila latexApostila latex
Apostila latex
 
Latex tutorial
Latex tutorialLatex tutorial
Latex tutorial
 
Latex2009
Latex2009Latex2009
Latex2009
 
Latex2009
Latex2009Latex2009
Latex2009
 
Apostila latex marcio_nascimento_da_silva_uva_ce_brasil
Apostila latex marcio_nascimento_da_silva_uva_ce_brasilApostila latex marcio_nascimento_da_silva_uva_ce_brasil
Apostila latex marcio_nascimento_da_silva_uva_ce_brasil
 
Linux basico
Linux basicoLinux basico
Linux basico
 
Linux
LinuxLinux
Linux
 
Enviando linux
Enviando linuxEnviando linux
Enviando linux
 
Linux cursos
Linux cursosLinux cursos
Linux cursos
 
Linux basico
Linux basicoLinux basico
Linux basico
 
Ncl e Lua - desenvolvendo aplicações interativas para tv digital
Ncl e Lua - desenvolvendo aplicações interativas para tv digitalNcl e Lua - desenvolvendo aplicações interativas para tv digital
Ncl e Lua - desenvolvendo aplicações interativas para tv digital
 
Apostila latexpdf
Apostila latexpdfApostila latexpdf
Apostila latexpdf
 
Apontamentos de Html
Apontamentos de HtmlApontamentos de Html
Apontamentos de Html
 
Apostila latex
Apostila latexApostila latex
Apostila latex
 
Latex
LatexLatex
Latex
 
Br office writer
Br office writerBr office writer
Br office writer
 
Php
PhpPhp
Php
 
Apostila curso matlab
Apostila curso matlabApostila curso matlab
Apostila curso matlab
 
Apostila Linguagem C
Apostila Linguagem CApostila Linguagem C
Apostila Linguagem C
 

String distance metrics and fuzzy matching in R