SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
LSH for similarity search in generic metric space
Eliezer de Souza da Silva
Department of Computer Engineering and Industrial Automation
School of Electrical and Computer Engineering
University of Campinas
eliezers@dca.fee.unicamp.br
Wednesday 8th October, 2014
Basic Concepts and Research Review
Similarity Search – metric space model
Generic model for proximity search;
Tuple (U, d), where U is a set and d a distance function (positive,
symmetric);
∀x, y, z ∈ U, d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality);
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 2 / 44
Basic Concepts and Research Review Locality sensitive hashing
Locality-sensitive hashing
Definition
Given a distance function d : X × X → R+, a function family
H = {h : X → C} is (r, cr, p1, p2)-sensitive for a given data set S ⊆ X if,
for any points p, q ∈ S, h ∈ H:
If d(p, q) ≤ r then PrH[h(q) = h(p)] ≥ p1 (probability of colliding
within the ball of radius r),
If d(p, q) > cr then PrH[h(q) = h(p)] ≤ p2 (probability of colliding
outside the ball of radius cr)
c > 1 and p1 > p2
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 3 / 44
Basic Concepts and Research Review Locality sensitive hashing
Locality-sensitive hashing
q
r
cr
p
p'
Figure: LSH and (R, c)-NNE.S. Silva () Metric LSH Wednesday 8th
October, 2014 4 / 44
Basic Concepts and Research Review Locality sensitive hashing
Quantizers
Data-dependent quantization has the advantage of more regular
population of points in each bucket and empirically performs better
than regular schemes [50]
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 5 / 44
Basic Concepts and Research Review Locality sensitive hashing
Existing LSH in General Metric Spaces
Novak et al. [41; 42]: M-Index: constructs a hierarchy of partitioning
of the dataset choosing points from the dataset as cluster centers.
Kang and Jung [28]: DFLSH (Distribution Free Locality-Sensitive
Hashing): randomly choose t points from the original dataset (with
n > t points) as centroids and index the dataset using the nearest
centroid as hash key – this construction yields an approximately
uniform number of points-per-bucket: O(n/t).
Tellez and Chavez [59]: map metric data to a permutation index,
encode permutation in hamming space and use Hamming LSH.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 6 / 44
Towards LSH in generic metric space VoronoiLSH
VoronoiLSH - Hashing function
Generate
L induced
Voronoi
Partitioning
L hash tables
h1 hL...[ ]
L associated
hash functions
➡ ➡...{ {
Definition
Given a metric space (U, d), C = {c1, . . . , ck } ⊂ U and x ∈ U:
hC : U → N
hC(x) = argmini=1,...,k {d(x, ci)}
(1)
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 7 / 44
Towards LSH in generic metric space VoronoiLSH
VoronoiLSH
C1
C2
C3
q
r
cr
Zq
p
Zp
d(q,p)
h(q)=h(p)=2
p'
h(p')=3
Zp'
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 8 / 44
Towards LSH in generic metric space VoronoiLSH
Performance and Cost Models
Range Cost
RC(n, k) =
n
k
+ k
⇒ RC(n) = 2
√
n
NN Cost
NNC(n, k, d) =
n
k
log(
n
k
) + d
n
k
+ dk
⇒ NNCopt (n, d) = O( nd(log(
√
n) + d + 1)
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 9 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
Probability model: (Ω, F, Pr)
Zp = d(p, NNC(p)) = d(p, C)
Ω = {Zx |x ∈ X, C ⊂ X}
Pr[hC(p) = hC(q)] = Pr[{Zq < d(q, NNC(p)} ∩ {Zp <
d(p, NNC(q)}]
p
q
NNC(p)
NNC(q)
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 10 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
d(p, q) > cr
{Zp + Zq < cr} ⊆ {Zp + Zq < d(p, q)}
{Zp + Zq < d(p, q)} ⊆ {Zq < d(q, NNC(p)} ∩ {Zp < d(p, NNC(q)}
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 11 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
d(p, q) > cr
{Zp + Zq < cr} ⊆ {Zp + Zq < d(p, q)}
{Zp + Zq < d(p, q)} ⊆ {Zq < d(q, NNC(p)} ∩ {Zp < d(p, NNC(q)}
⇒ Pr[hC(p) = hC(q)] ≥ Pr[Zq + Zp < cr]
⇒ Pr[hC(p) = hC(q)] ≤ Pr[Zq + Zp ≥ cr] = p2
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 11 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
d(p, q) < r
d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq
d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 12 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
d(p, q) < r
d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq
d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp
⇒ {Zp < d(p, NNC(q)} ⊆ {Zp < r + Zq}
⇒ {Zq < d(q, NNC(p)} ⊆ {Zq < r + Zp}
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 12 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
d(p, q) < r
d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq
d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp
⇒ {Zp < d(p, NNC(q)} ⊆ {Zp < r + Zq}
⇒ {Zq < d(q, NNC(p)} ⊆ {Zq < r + Zp}
⇒ Pr[hC(p) = hC(q)] ≤ Pr[|Zq − Zp| < r]
⇒ Pr[hC(p) = hC(q)] ≥ Pr[|Zq − Zp| ≥ r] = p1
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 12 / 44
Towards LSH in generic metric space VoronoiLSH
Hash probabilities bounds
p1 ≥ p2: needs two assumptions, “Zq < δr” (δ > 0) and
“c > 2δ + 1”;
p1 > p2: needs consider a hypothetical case where “Zq = r − ”
and “Zp = 2δr − ”, for > 0.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 13 / 44
Towards LSH in generic metric space VoronoiPlexLSH
VoronoiPlex LSH - Hash function construction
Multiple VoronoiLSH with a controlled number of distance computation
input : size k of the sample, number of distinct partitioning w, and
integer number of centroidsp
output: A hash function hk,w,p
selected ← new binary array of size k;
subsample ← new integer multi-array of size w × p;
for j ← 1 to w do
Random sample S = {s1, · · · , sp} from {1, · · · , k};
for i ← 1 to p do
subsample[j, i] ← si;
selected[si] ← 1;
end
end
hk,w,p ← (selected,subsample) ;
Algorithm 1: Hash function building
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 14 / 44
Towards LSH in generic metric space VoronoiPlexLSH
VoronoiPlex LSH - Hashing algorithm
input : Hash function object hk,w,p,Sample C = {c1, . . . , ck } ⊂ X
(|C| = k) and a point q ∈ X
output: Integer value hk,w,p(q)
(selected,subsample) ← retrieved from hk,w,p distances ← new
floating-point array of size k;
for j ← 1 to k do
if selected[j] == 1 then
distances[j] ← d(q, cj) ;
end
end
hasharray ← new integer array of size w;
for i ← 1 to w do
hasharray[i] ← element in subsample[i] that minimize distances[j]
(varying j) ;
end
hk,w,p(q) ← hash(hasharray) ;
Algorithm 2: Hash function ApplicationE.S. Silva () Metric LSH Wednesday 8th
October, 2014 15 / 44
Towards LSH in generic metric space VoronoiPlexLSH
VoronoiPlex LSH
1 2 5 2
c1c2 c3 c4c5
c1
c3
c4
c3
c5
c2
c5
c1
c3
c5
c4
c2
h5,4,3={ {
h5,4,3(p)=
IEi=1,··· ,k [selected[i] = 1] = k − k(1 − p
k )w
O(k − k ) number of distance computation (intrinsic cost)
a more complicated analysis for the extrinsic cost
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 16 / 44
Towards LSH in generic metric space Parallel VoronoiLSH
Parallel VoronoiLSH
Dataflow programming distributed computation;
Computing stages distributed in processors and nodes;
Message-passing interface.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 17 / 44
Results Datasets
Datasets
APM (Arquivo Público Mineiro – The Public Archives in Minas
Gerais)
2.871.300 feature vectors (SIFT descriptor is a 128 dimensional
vector).
queries dataset: 263.968 feature vectors with ground-truth.
For the experiments we used 5000 queries uniformly sampled from
the query dataset and performed a 10-NN search.
Metric datasets: Listeria (20660/ 100) and English (66069 / 500 )
dictionary;
BigANN (1B) for large scale experiments: (109 / 104).
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 18 / 44
Results Experimental results
APM
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.001 0.01 0.1 1
recall
extensiveness
DFLSH
K-MedoidsLSH
K-MeansLSH
(a) Recall x Extensiveness (log scale)
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02 0.025 0.03
recall
extensiveness
DFLSH
K-MedoidsLSH
K-MeansLSH
L=1
L=5
L=8
(b) Recall x Number of hash functions L,
Extensiveness (for 5000 cluster centers)
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 19 / 44
Results Experimental results
English dataset - VoronoiLSH and BPI
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
fraction of query time of linear scan
0.70
0.75
0.80
0.85
0.90
0.95
1.00recall
Voronoi LSH with K-means++, L=5
DFLSH, L=5
Voronoi LSH with K-means++, L=8
DFLSH, L=8
Brief Proximity Index (BPI) LSH
Figure: Recall for Voronoi LSH and BPI LSH
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 20 / 44
Results Experimental results
Listeria
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.011
recall
extensivity
DFLSH L=2
DFLSH L=3
(a) Recall x Extensivity
0.00 0.05 0.10 0.15 0.20 0.25
extensivity
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
recall
w=2
w=5
w=10
W=2
W=5
W=10
VoronoiPlex LSH for L=1,nCluster=10
VoronoiPlex LSH for L=8,nCluster=10
(b) varying the size w of the key-length (10
centroids selected from a 4000 point sample
set)
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 21 / 44
Results Experimental results
Large scale experiment
(c) Query time / Recall (d) Parallel efficiency
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 22 / 44
Conclusions
Results and challenges
Using metric partitioning techniques for hashing functions in metric
space is a valid technique and should be further explored and
developed;
The experiments do not show any clear advantage in learning the
seeds of the Voronoi diagram by clustering;
It would be interesting to equip the analysis with more assumptions
of the data;
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 23 / 44
References
References I
[1] Fernando Akune. Indexação Multimídia escalável e busca por
similaridade em alta dimensionalidade. M. sc., Universidade
Estadual de Campinas (Unicamp), 2011.
[2] Fernando Akune, Eduardo Valle, and Ricardo Torres. MONORAIL:
A Disk-Friendly Index for Huge Descriptor Databases. In 2010 20th
International Conference on Pattern Recognition, pages
4145–4148. IEEE, August 2010.
[3] Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing
Algorithms for Approximate Nearest Neighbor in High Dimensions.
2006 47th Annual IEEE Symposium on Foundations of Computer
Science (FOCS’06), pages 459–468, 2006.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 24 / 44
References
References II
[4] David Arthur and Sergei Vassilvitskii. k-means++: the advantages
of careful seeding. In Proceedings of the eighteenth annual
ACM-SIAM symposium on Discrete algorithms, SODA ’07, pages
1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and
Applied Mathematics.
[5] Sunil Arya and David M. Mount. Approximate nearest neighbor
queries in fixed dimensions. In Proceedings of the fourth annual
ACM-SIAM Symposium on Discrete algorithms, SODA ’93, pages
271–280, Philadelphia, PA, USA, 1993. Society for Industrial and
Applied Mathematics.
[6] Bahman Bahmani, Ashish Goel, and Rajendra Shinde. Efficient
distributed locality sensitive hashing. In Proceedings of the 21st
ACM International Conference on Information and Knowledge
Management, CIKM ’12, pages 2174–2178, New York, NY, USA,
2012. ACM.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 25 / 44
References
References III
[7] Mayank Bawa, Tyson Condie, and Prasanna Ganesan. LSH forest.
In Proceedings of the 14th international conference on World Wide
Web - WWW ’05, page 651, New York, New York, USA, 2005. ACM
Press.
[8] R.E. Bellman. Dynamic Programming. Dover Books on Computer
Science Series. Dover Publications, Incorporated, 2003.
[9] Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. The
x-tree: An index structure for high-dimensional data. In
Proceedings of the 22th International Conference on Very Large
Data Bases, VLDB ’96, pages 28–39, San Francisco, CA, USA,
1996. Morgan Kaufmann Publishers Inc.
[10] Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Chialin Chang,
Alan Sussman, and Joel Saltz. Distributed processing of very large
datasets with DataCutter. Parallel Comput., 27(11):1457–1478,
2001.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 26 / 44
References
References IV
[11] Christian Böhm, Stefan Berchtold, and Daniel A. Keim. Searching
in high-dimensional spaces: Index structures for improving the
performance of multimedia databases. ACM Computing Surveys,
33(3):322–373, September 2001.
[12] W. A. Burkhard and R. M. Keller. Some approaches to best-match
file searching. Commun. ACM, 16(4):230–236, April 1973.
[13] Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, and
José Luis Marroquín. Searching in metric spaces. ACM Computing
Surveys, 33(3):273–321, September 2001.
[14] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An
efficient access method for similarity search in metric spaces. In
Proceedings of the 23rd International Conference on Very Large
Data Bases, VLDB ’97, pages 426–435, San Francisco, CA, USA,
1997. Morgan Kaufmann Publishers Inc.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 27 / 44
References
References V
[15] Kenneth L Clarkson. Nearest-Neighbor Searching and Metric
Space Dimensions. In Gregory Shakhnarovich, Trevor Darrell, and
Piotr Indyk, editors, Nearest-Neighbor Methods in Learning and
Vision: Theory and Practice (Neural Information Processing),
Advances in Neural Information Processing Systems. The MIT
Press, 2006.
[16] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni.
Locality-sensitive hashing scheme based on p-stable distributions.
In Proceedings of the twentieth annual symposium on
Computational geometry - SCG ’04, page 253, New York, New
York, USA, 2004. ACM Press.
[17] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. Image
retrieval. ACM Computing Surveys, 40(2):1–60, April 2008.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 28 / 44
References
References VI
[18] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similarity
search and classification via rank aggregation. In Proceedings of
the 2003 ACM SIGMOD international conference on Management
of data, SIGMOD ’03, pages 301–312, New York, NY, USA, 2003.
ACM.
[19] C. Faloutsos and S. Roseman. Fractals for secondary key retrieval.
In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART
symposium on Principles of database systems, PODS ’89, pages
247–252, New York, NY, USA, 1989. ACM.
[20] Christos Faloutsos. Multiattribute hashing using gray codes.
SIGMOD Rec., 15(2):227–238, June 1986.
[21] Volker Gaede and Oliver Günther. Multidimensional access
methods. ACM Computing Surveys, 30(2):170–231, June 1998.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 29 / 44
References
References VII
[22] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity
search in high dimensions via hashing. In Proceedings of the 25th
International Conference on Very Large Data Bases, VLDB ’99,
pages 518–529, San Francisco, CA, USA, 1999. Morgan
Kaufmann Publishers Inc.
[23] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors.
In Proceedings of the thirtieth annual ACM symposium on Theory
of computing - STOC ’98, pages 604–613, New York, New York,
USA, 1998. ACM Press.
[24] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review.
ACM Computing Surveys, 31(3):264–323, September 1999.
[25] H. Jegou, M. Douze, and C. Schmid. Product quantization for
nearest neighbor search. IEEE Transactions on Pattern Analysis
and Machine Intelligence,, 33(1):117–128, 2011.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 30 / 44
References
References VIII
[26] Herve Jegou, Laurent Amsaleg, Cordelia Schmid, and Patrick Gros.
Query adaptative locality sensitive hashing. In 2008 IEEE
International Conference on Acoustics, Speech and Signal
Processing, pages 825–828. IEEE, March 2008.
[27] Alexis Joly and Olivier Buisson. A posteriori multi-probe locality
sensitive hashing. In Proceeding of the 16th ACM international
conference on Multimedia - MM ’08, page 209, New York, New
York, USA, 2008. ACM Press.
[28] Byungkon Kang and Kyomin Jung. Robust and Efficient Locality
Sensitive Hashing for Nearest Neighbor Search in Large Data Sets.
In NIPS Workshop on Big Learning (BigLearn), pages 1–8, Lake
Tahoe, Nevada, 2012.
[29] Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in
Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th
edition, March 1990.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 31 / 44
References
References IX
[30] Jon M. Kleinberg. Two algorithms for nearest-neighbor search in
high dimensions. In Proceedings of the twenty-ninth annual ACM
symposium on Theory of computing, STOC ’97, pages 599–608,
New York, NY, USA, 1997. ACM.
[31] Martin Kruliš, Tomáš Skopal, Jakub Lokoˇc, and Christian Beecks.
Combining cpu and gpu architectures for fast similarity search.
Distributed and Parallel Databases, 30(3-4):179–207, 2012.
[32] John Leech. Some sphere packings in higher space. Canadian
Journal of Mathematics, 16:657–682, January 1964.
[33] Herwig Lejsek, Fridrik Heidar Ásmundsson, Björn THór Jónsson,
and Laurent Amsaleg. Efficient and effective image copyright
enforcement. In BDA, 2005.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 32 / 44
References
References X
[34] S. Liao, M.a. Lopez, and S.T. Leutenegger. High dimensional
similarity search with space filling curves. In Proceedings 17th
International Conference on Data Engineering, pages 615–622.
IEEE Comput. Soc, 2001.
[35] David G. Lowe. Distinctive Image Features from Scale-Invariant
Keypoints. International Journal of Computer Vision, 60(2):91–110,
November 2004.
[36] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li.
Multi-probe LSH: efficient indexing for high-dimensional similarity
search. In Proceedings of the 33rd international conference on
Very large data bases, VLDB ’07, pages 950–961. VLDB
Endowment, 2007.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 33 / 44
References
References XI
[37] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li.
Multi-probe LSH: efficient indexing for high-dimensional similarity
search. In Proceedings of the 33rd international conference on
Very large data bases, VLDB ’07, pages 950–961. VLDB
Endowment, 2007.
[38] G. Mainar-Ruiz and J. Perez-Cortes. Approximate Nearest
Neighbor Search using a Single Space-filling Curve and Multiple
Representations of the Data Points. In 18th International
Conference on Pattern Recognition (ICPR’06), pages 502–505.
IEEE, 2006.
[39] Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds
on locality sensitive hashing. In Proceedings of the twenty-second
annual symposium on Computational geometry - SCG ’06, page
154, New York, New York, USA, 2006. ACM Press.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 34 / 44
References
References XII
[40] RT Ng. CLARANS: a method for clustering objects for spatial data
mining. IEEE Transactions on Knowledge and Data Engineering,
14(5):1003–1016, September 2002.
[41] David Novak and Michal Batko. Metric Index: An Efficient and
Scalable Solution for Similarity Search. In 2009 Second
International Workshop on Similarity Search and Applications,
pages 65–73. IEEE, August 2009.
[42] David Novak, Martin Kyselak, and Pavel Zezula. On
locality-sensitive indexing in generic metric spaces. Proceedings of
the Third International Conference on SImilarity Search and
APplications - SISAP ’10, page 59, 2010.
[43] Alexander Ocsa and Elaine P M De Sousa. An Adaptive Multi-level
Hashing Structure for Fast Approximate Similarity Search. Journal
of Information and Data Management, 1(3):359–374, 2010.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 35 / 44
References
References XIII
[44] Rafail Ostrovsky, Yuval Rabani, Leonard Schulman, and Chaitanya
Swamy. The Effectiveness of Lloyd-Type Methods for the k-Means
Problem. In 2006 47th Annual IEEE Symposium on Foundations of
Computer Science (FOCS’06), volume 59, pages 165–176. IEEE,
December 2006.
[45] Jia Pan and Dinesh Manocha. Fast GPU-based locality sensitive
hashing for k-nearest neighbor computation. In 19th ACM
SIGSPATIAL Int. Conf. on Advances in Geographic Information
Systems, GIS ’11. ACM, 2011.
[46] Rina Panigrahy. Entropy based nearest neighbor search in high
dimensions. In Proceedings of the seventeenth annual ACM-SIAM
symposium on Discrete algorithm, SODA ’06, pages 1186–1195,
New York, NY, USA, 2006. ACM.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 36 / 44
References
References XIV
[47] Rina Panigrahy. Entropy based nearest neighbor search in high
dimensions. In Proceedings of the seventeenth annual ACM-SIAM
symposium on Discrete algorithm, SODA ’06, pages 1186–1195,
New York, NY, USA, 2006. ACM.
[48] Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for
K-medoids clustering. Expert Systems with Applications,
36(2):3336–3341, 2009.
[49] Adriano Arantes Paterlini, Mario A Nascimento, and
Caetano Traina Junior. Using Pivots to Speed-Up k-Medoids
Clustering. Journal of Information and Data Management,
2(2):221–236, June 2011.
[50] Loïc Paulevé, Hervé Jégou, and Laurent Amsaleg. Locality
sensitive hashing: A comparison of hash function types and
querying mechanisms. Pattern Recognition Letters,
31(11):1348–1358, August 2010.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 37 / 44
References
References XV
[51] D. Pollard. Quantization and the method of k-means. IEEE
Transactions on Information Theory, 28(2):199–205, March 1982.
[52] Hanan Samet. Foundations of Multidimensional and Metric Data
Structures (The Morgan Kaufmann Series in Computer Graphics
and Geometric Modeling). Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2005.
[53] Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk.
Nearest-Neighbor Methods in Learning and Vision: Theory and
Practice (Neural Information Processing). The MIT Press, 2006.
[54] James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu,
Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi, and
Abdur Chowdhury, editors. Proceedings of the 17th ACM
Conference on Information and Knowledge Management, CIKM
2008, Napa Valley, California, USA, October 26-30, 2008. ACM,
2008.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 38 / 44
References
References XVI
[55] Tomáš Skopal. Where are you heading, metric access methods?:
a provocative survey. In Proceedings of the Third International
Conference on SImilarity Search and APplications, SISAP ’10,
pages 13–21, New York, NY, USA, 2010. ACM.
[56] Malcolm Slaney, Yury Lifshits, and Junfeng He. Optimal
Parameters for Locality-Sensitive Hashing. Proceedings of the
IEEE, 100(9):2604–2623, 2012.
[57] Raisa Socorro, Luisa Micó, and Jose Oncina. A fast pivot-based
indexing algorithm for metric spaces. Pattern Recognition Letters,
32(11):1511–1516, August 2011.
[58] Aleksandar Stupar, Sebastian Michel, and Ralf Schenkel.
RankReduce - processing K-Nearest Neighbor queries on top of
MapReduce. In In LSDS-IR, 2010.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 39 / 44
References
References XVII
[59] Eric Sadit Tellez and Edgar Chavez. On locality sensitive hashing
in metric spaces. In Proceedings of the Third International
Conference on SImilarity Search and APplications, SISAP ’10,
pages 67–74, New York, NY, USA, 2010. ACM.
[60] George Teodoro, Daniel Fireman, Dorgival Guedes, Wagner Meira
Jr., and Renato Ferreira. Achieving multi-level parallelism in the
filter-labeled stream programming model. Parallel Processing,
International Conference on, 0:287–294, 2008.
[61] George Teodoro, Eduardo Valle, Nathan Mariano, Ricardo Torres,
and Wagner Meira, Jr. Adaptive parallel approximate similarity
search for responsive multimedia retrieval. In Proc. of the 20th
ACM international conference on Information and knowledge
management, CIKM ’11. ACM, 2011.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 40 / 44
References
References XVIII
[62] A.J.M. Traina, A. Traina, C. Faloutsos, and B. Seeger. Fast
indexing and visualization of metric data sets using slim-trees.
Knowledge and Data Engineering, IEEE Transactions on,
14(2):244–260, 2002.
[63] Caetano Traina, Jr., Agma J. M. Traina, Bernhard Seeger, and
Christos Faloutsos. Slim-trees: High performance metric trees
minimizing overlap between nodes. In Proceedings of the 7th
International Conference on Extending Database Technology:
Advances in Database Technology, EDBT ’00, pages 51–65,
London, UK, UK, 2000. Springer-Verlag.
[64] Jeffrey K. Uhlmann. Satisfying general proximity / similarity queries
with metric trees. Information Processing Letters, 40(4):175 – 179,
1991.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 41 / 44
References
References XIX
[65] Eduardo Valle and Matthieu Cord. Advanced Techniques in CBIR:
Local Descriptors, Visual Dictionaries and Bags of Features. In
2009 Tutorials of the XXII Brazilian Symposium on Computer
Graphics and Image Processing, pages 72–78. IEEE, October
2009.
[66] Eduardo Valle, Matthieu Cord, and Sylvie Philipp-Foliguet.
High-dimensional descriptor indexing for large multimedia
databases. In Shanahan et al. [54], pages 739–748.
[67] Hongbo Xu. An Approximate Nearest Neighbor Query Algorithm
Based on Hilbert Curve. In 2011 International Conference on
Internet Computing and Information Services, pages 514–517.
IEEE, September 2011.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 42 / 44
References
References XX
[68] Peter N. Yianilos. Data structures and algorithms for nearest
neighbor search in general metric spaces. In Proceedings of the
fourth annual ACM-SIAM Symposium on Discrete algorithms,
SODA ’93, pages 311–321, Philadelphia, PA, USA, 1993. Society
for Industrial and Applied Mathematics.
[69] Pavel Zezula. Future trends in similarity searching. In Proceedings
of the 5th international conference on Similarity Search and
Applications, SISAP’12, pages 8–24, Berlin, Heidelberg, 2012.
Springer-Verlag.
[70] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal
Batko. Similarity Search - The Metric Space Approach, volume 32
of Advances in Database Systems. Kluwer Academic Publishers,
Boston, 2006.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 43 / 44
References
References XXI
[71] Pavel Zezula, Pasquale Savino, Giuseppe Amato, and Fausto
Rabitti. Approximate similarity retrieval with m-trees. The VLDB
Journal, 7(4):275–293, December 1998.
[72] Qiaoping Zhang and Isabelle Couloigner. A new and efficient
k-medoid algorithm for spatial clustering. In Osvaldo Gervasi,
MarinaL. Gavrilova, Vipin Kumar, Antonio Laganà, HeowPueh Lee,
Youngsong Mun, David Taniar, and ChihJengKenneth Tan, editors,
Computational Science and Its Applications – ICCSA 2005, volume
3482 of Lecture Notes in Computer Science, pages 181–189.
Springer Berlin Heidelberg, 2005.
E.S. Silva () Metric LSH Wednesday 8th
October, 2014 44 / 44

Mais conteúdo relacionado

Mais procurados

Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Anmol Dwivedi
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
olli0601
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Kostis Kyzirakos
 

Mais procurados (20)

Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
03.01 hash tables
03.01 hash tables03.01 hash tables
03.01 hash tables
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Semelhante a Locality-sensitive hashing for search in metric space

Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Huang Po Chun
 
Introduction to spatstat
Introduction to spatstatIntroduction to spatstat
Introduction to spatstat
Richard Wamalwa
 

Semelhante a Locality-sensitive hashing for search in metric space (20)

Lec 5-nn-slides
Lec 5-nn-slidesLec 5-nn-slides
Lec 5-nn-slides
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher Dimensions
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
 
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
Efficient Identification of Improving Moves in a Ball for Pseudo-Boolean Prob...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Scribed lec8
Scribed lec8Scribed lec8
Scribed lec8
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Otter 2016-11-28-01-ss
Otter 2016-11-28-01-ssOtter 2016-11-28-01-ss
Otter 2016-11-28-01-ss
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Introduction to spatstat
Introduction to spatstatIntroduction to spatstat
Introduction to spatstat
 

Mais de Eliezer Silva

Mais de Eliezer Silva (6)

Cybernetics, human-in-the-loop and probabilistic modelling for recommender sy...
Cybernetics, human-in-the-loop and probabilistic modelling for recommender sy...Cybernetics, human-in-the-loop and probabilistic modelling for recommender sy...
Cybernetics, human-in-the-loop and probabilistic modelling for recommender sy...
 
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
Content-Based Social Recommendation with Poisson Matrix Factorization (ECML-P...
 
Complex networks: community detection and virus propagation
Complex networks: community detection and virus propagationComplex networks: community detection and virus propagation
Complex networks: community detection and virus propagation
 
Probabilistic Matrix Factorization (extensions of models)
Probabilistic Matrix Factorization (extensions of models)Probabilistic Matrix Factorization (extensions of models)
Probabilistic Matrix Factorization (extensions of models)
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Rotações
RotaçõesRotações
Rotações
 

Último

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Último (20)

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Locality-sensitive hashing for search in metric space

  • 1. LSH for similarity search in generic metric space Eliezer de Souza da Silva Department of Computer Engineering and Industrial Automation School of Electrical and Computer Engineering University of Campinas eliezers@dca.fee.unicamp.br Wednesday 8th October, 2014
  • 2. Basic Concepts and Research Review Similarity Search – metric space model Generic model for proximity search; Tuple (U, d), where U is a set and d a distance function (positive, symmetric); ∀x, y, z ∈ U, d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality); E.S. Silva () Metric LSH Wednesday 8th October, 2014 2 / 44
  • 3. Basic Concepts and Research Review Locality sensitive hashing Locality-sensitive hashing Definition Given a distance function d : X × X → R+, a function family H = {h : X → C} is (r, cr, p1, p2)-sensitive for a given data set S ⊆ X if, for any points p, q ∈ S, h ∈ H: If d(p, q) ≤ r then PrH[h(q) = h(p)] ≥ p1 (probability of colliding within the ball of radius r), If d(p, q) > cr then PrH[h(q) = h(p)] ≤ p2 (probability of colliding outside the ball of radius cr) c > 1 and p1 > p2 E.S. Silva () Metric LSH Wednesday 8th October, 2014 3 / 44
  • 4. Basic Concepts and Research Review Locality sensitive hashing Locality-sensitive hashing q r cr p p' Figure: LSH and (R, c)-NNE.S. Silva () Metric LSH Wednesday 8th October, 2014 4 / 44
  • 5. Basic Concepts and Research Review Locality sensitive hashing Quantizers Data-dependent quantization has the advantage of more regular population of points in each bucket and empirically performs better than regular schemes [50] E.S. Silva () Metric LSH Wednesday 8th October, 2014 5 / 44
  • 6. Basic Concepts and Research Review Locality sensitive hashing Existing LSH in General Metric Spaces Novak et al. [41; 42]: M-Index: constructs a hierarchy of partitioning of the dataset choosing points from the dataset as cluster centers. Kang and Jung [28]: DFLSH (Distribution Free Locality-Sensitive Hashing): randomly choose t points from the original dataset (with n > t points) as centroids and index the dataset using the nearest centroid as hash key – this construction yields an approximately uniform number of points-per-bucket: O(n/t). Tellez and Chavez [59]: map metric data to a permutation index, encode permutation in hamming space and use Hamming LSH. E.S. Silva () Metric LSH Wednesday 8th October, 2014 6 / 44
  • 7. Towards LSH in generic metric space VoronoiLSH VoronoiLSH - Hashing function Generate L induced Voronoi Partitioning L hash tables h1 hL...[ ] L associated hash functions ➡ ➡...{ { Definition Given a metric space (U, d), C = {c1, . . . , ck } ⊂ U and x ∈ U: hC : U → N hC(x) = argmini=1,...,k {d(x, ci)} (1) E.S. Silva () Metric LSH Wednesday 8th October, 2014 7 / 44
  • 8. Towards LSH in generic metric space VoronoiLSH VoronoiLSH C1 C2 C3 q r cr Zq p Zp d(q,p) h(q)=h(p)=2 p' h(p')=3 Zp' E.S. Silva () Metric LSH Wednesday 8th October, 2014 8 / 44
  • 9. Towards LSH in generic metric space VoronoiLSH Performance and Cost Models Range Cost RC(n, k) = n k + k ⇒ RC(n) = 2 √ n NN Cost NNC(n, k, d) = n k log( n k ) + d n k + dk ⇒ NNCopt (n, d) = O( nd(log( √ n) + d + 1) E.S. Silva () Metric LSH Wednesday 8th October, 2014 9 / 44
  • 10. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds Probability model: (Ω, F, Pr) Zp = d(p, NNC(p)) = d(p, C) Ω = {Zx |x ∈ X, C ⊂ X} Pr[hC(p) = hC(q)] = Pr[{Zq < d(q, NNC(p)} ∩ {Zp < d(p, NNC(q)}] p q NNC(p) NNC(q) E.S. Silva () Metric LSH Wednesday 8th October, 2014 10 / 44
  • 11. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds d(p, q) > cr {Zp + Zq < cr} ⊆ {Zp + Zq < d(p, q)} {Zp + Zq < d(p, q)} ⊆ {Zq < d(q, NNC(p)} ∩ {Zp < d(p, NNC(q)} E.S. Silva () Metric LSH Wednesday 8th October, 2014 11 / 44
  • 12. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds d(p, q) > cr {Zp + Zq < cr} ⊆ {Zp + Zq < d(p, q)} {Zp + Zq < d(p, q)} ⊆ {Zq < d(q, NNC(p)} ∩ {Zp < d(p, NNC(q)} ⇒ Pr[hC(p) = hC(q)] ≥ Pr[Zq + Zp < cr] ⇒ Pr[hC(p) = hC(q)] ≤ Pr[Zq + Zp ≥ cr] = p2 E.S. Silva () Metric LSH Wednesday 8th October, 2014 11 / 44
  • 13. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds d(p, q) < r d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44
  • 14. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds d(p, q) < r d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp ⇒ {Zp < d(p, NNC(q)} ⊆ {Zp < r + Zq} ⇒ {Zq < d(q, NNC(p)} ⊆ {Zq < r + Zp} E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44
  • 15. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds d(p, q) < r d(p, NNC(q)) ≤ d(p, q) + Zq ≤ r + Zq d(q, NNC(p)) ≤ d(p, q) + Zp ≤ r + Zp ⇒ {Zp < d(p, NNC(q)} ⊆ {Zp < r + Zq} ⇒ {Zq < d(q, NNC(p)} ⊆ {Zq < r + Zp} ⇒ Pr[hC(p) = hC(q)] ≤ Pr[|Zq − Zp| < r] ⇒ Pr[hC(p) = hC(q)] ≥ Pr[|Zq − Zp| ≥ r] = p1 E.S. Silva () Metric LSH Wednesday 8th October, 2014 12 / 44
  • 16. Towards LSH in generic metric space VoronoiLSH Hash probabilities bounds p1 ≥ p2: needs two assumptions, “Zq < δr” (δ > 0) and “c > 2δ + 1”; p1 > p2: needs consider a hypothetical case where “Zq = r − ” and “Zp = 2δr − ”, for > 0. E.S. Silva () Metric LSH Wednesday 8th October, 2014 13 / 44
  • 17. Towards LSH in generic metric space VoronoiPlexLSH VoronoiPlex LSH - Hash function construction Multiple VoronoiLSH with a controlled number of distance computation input : size k of the sample, number of distinct partitioning w, and integer number of centroidsp output: A hash function hk,w,p selected ← new binary array of size k; subsample ← new integer multi-array of size w × p; for j ← 1 to w do Random sample S = {s1, · · · , sp} from {1, · · · , k}; for i ← 1 to p do subsample[j, i] ← si; selected[si] ← 1; end end hk,w,p ← (selected,subsample) ; Algorithm 1: Hash function building E.S. Silva () Metric LSH Wednesday 8th October, 2014 14 / 44
  • 18. Towards LSH in generic metric space VoronoiPlexLSH VoronoiPlex LSH - Hashing algorithm input : Hash function object hk,w,p,Sample C = {c1, . . . , ck } ⊂ X (|C| = k) and a point q ∈ X output: Integer value hk,w,p(q) (selected,subsample) ← retrieved from hk,w,p distances ← new floating-point array of size k; for j ← 1 to k do if selected[j] == 1 then distances[j] ← d(q, cj) ; end end hasharray ← new integer array of size w; for i ← 1 to w do hasharray[i] ← element in subsample[i] that minimize distances[j] (varying j) ; end hk,w,p(q) ← hash(hasharray) ; Algorithm 2: Hash function ApplicationE.S. Silva () Metric LSH Wednesday 8th October, 2014 15 / 44
  • 19. Towards LSH in generic metric space VoronoiPlexLSH VoronoiPlex LSH 1 2 5 2 c1c2 c3 c4c5 c1 c3 c4 c3 c5 c2 c5 c1 c3 c5 c4 c2 h5,4,3={ { h5,4,3(p)= IEi=1,··· ,k [selected[i] = 1] = k − k(1 − p k )w O(k − k ) number of distance computation (intrinsic cost) a more complicated analysis for the extrinsic cost E.S. Silva () Metric LSH Wednesday 8th October, 2014 16 / 44
  • 20. Towards LSH in generic metric space Parallel VoronoiLSH Parallel VoronoiLSH Dataflow programming distributed computation; Computing stages distributed in processors and nodes; Message-passing interface. E.S. Silva () Metric LSH Wednesday 8th October, 2014 17 / 44
  • 21. Results Datasets Datasets APM (Arquivo Público Mineiro – The Public Archives in Minas Gerais) 2.871.300 feature vectors (SIFT descriptor is a 128 dimensional vector). queries dataset: 263.968 feature vectors with ground-truth. For the experiments we used 5000 queries uniformly sampled from the query dataset and performed a 10-NN search. Metric datasets: Listeria (20660/ 100) and English (66069 / 500 ) dictionary; BigANN (1B) for large scale experiments: (109 / 104). E.S. Silva () Metric LSH Wednesday 8th October, 2014 18 / 44
  • 22. Results Experimental results APM 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.001 0.01 0.1 1 recall extensiveness DFLSH K-MedoidsLSH K-MeansLSH (a) Recall x Extensiveness (log scale) 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.005 0.01 0.015 0.02 0.025 0.03 recall extensiveness DFLSH K-MedoidsLSH K-MeansLSH L=1 L=5 L=8 (b) Recall x Number of hash functions L, Extensiveness (for 5000 cluster centers) E.S. Silva () Metric LSH Wednesday 8th October, 2014 19 / 44
  • 23. Results Experimental results English dataset - VoronoiLSH and BPI 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 fraction of query time of linear scan 0.70 0.75 0.80 0.85 0.90 0.95 1.00recall Voronoi LSH with K-means++, L=5 DFLSH, L=5 Voronoi LSH with K-means++, L=8 DFLSH, L=8 Brief Proximity Index (BPI) LSH Figure: Recall for Voronoi LSH and BPI LSH E.S. Silva () Metric LSH Wednesday 8th October, 2014 20 / 44
  • 24. Results Experimental results Listeria 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.011 recall extensivity DFLSH L=2 DFLSH L=3 (a) Recall x Extensivity 0.00 0.05 0.10 0.15 0.20 0.25 extensivity 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 recall w=2 w=5 w=10 W=2 W=5 W=10 VoronoiPlex LSH for L=1,nCluster=10 VoronoiPlex LSH for L=8,nCluster=10 (b) varying the size w of the key-length (10 centroids selected from a 4000 point sample set) E.S. Silva () Metric LSH Wednesday 8th October, 2014 21 / 44
  • 25. Results Experimental results Large scale experiment (c) Query time / Recall (d) Parallel efficiency E.S. Silva () Metric LSH Wednesday 8th October, 2014 22 / 44
  • 26. Conclusions Results and challenges Using metric partitioning techniques for hashing functions in metric space is a valid technique and should be further explored and developed; The experiments do not show any clear advantage in learning the seeds of the Voronoi diagram by clustering; It would be interesting to equip the analysis with more assumptions of the data; E.S. Silva () Metric LSH Wednesday 8th October, 2014 23 / 44
  • 27. References References I [1] Fernando Akune. Indexação Multimídia escalável e busca por similaridade em alta dimensionalidade. M. sc., Universidade Estadual de Campinas (Unicamp), 2011. [2] Fernando Akune, Eduardo Valle, and Ricardo Torres. MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases. In 2010 20th International Conference on Pattern Recognition, pages 4145–4148. IEEE, August 2010. [3] Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 459–468, 2006. E.S. Silva () Metric LSH Wednesday 8th October, 2014 24 / 44
  • 28. References References II [4] David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’07, pages 1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics. [5] Sunil Arya and David M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, SODA ’93, pages 271–280, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics. [6] Bahman Bahmani, Ashish Goel, and Rajendra Shinde. Efficient distributed locality sensitive hashing. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 2174–2178, New York, NY, USA, 2012. ACM. E.S. Silva () Metric LSH Wednesday 8th October, 2014 25 / 44
  • 29. References References III [7] Mayank Bawa, Tyson Condie, and Prasanna Ganesan. LSH forest. In Proceedings of the 14th international conference on World Wide Web - WWW ’05, page 651, New York, New York, USA, 2005. ACM Press. [8] R.E. Bellman. Dynamic Programming. Dover Books on Computer Science Series. Dover Publications, Incorporated, 2003. [9] Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. The x-tree: An index structure for high-dimensional data. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB ’96, pages 28–39, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc. [10] Michael D. Beynon, Tahsin Kurc, Umit Catalyurek, Chialin Chang, Alan Sussman, and Joel Saltz. Distributed processing of very large datasets with DataCutter. Parallel Comput., 27(11):1457–1478, 2001. E.S. Silva () Metric LSH Wednesday 8th October, 2014 26 / 44
  • 30. References References IV [11] Christian Böhm, Stefan Berchtold, and Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3):322–373, September 2001. [12] W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching. Commun. ACM, 16(4):230–236, April 1973. [13] Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, and José Luis Marroquín. Searching in metric spaces. ACM Computing Surveys, 33(3):273–321, September 2001. [14] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB ’97, pages 426–435, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. E.S. Silva () Metric LSH Wednesday 8th October, 2014 27 / 44
  • 31. References References V [15] Kenneth L Clarkson. Nearest-Neighbor Searching and Metric Space Dimensions. In Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk, editors, Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing), Advances in Neural Information Processing Systems. The MIT Press, 2006. [16] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry - SCG ’04, page 253, New York, New York, USA, 2004. ACM Press. [17] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. Image retrieval. ACM Computing Surveys, 40(2):1–60, April 2008. E.S. Silva () Metric LSH Wednesday 8th October, 2014 28 / 44
  • 32. References References VI [18] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD ’03, pages 301–312, New York, NY, USA, 2003. ACM. [19] C. Faloutsos and S. Roseman. Fractals for secondary key retrieval. In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, PODS ’89, pages 247–252, New York, NY, USA, 1989. ACM. [20] Christos Faloutsos. Multiattribute hashing using gray codes. SIGMOD Rec., 15(2):227–238, June 1986. [21] Volker Gaede and Oliver Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170–231, June 1998. E.S. Silva () Metric LSH Wednesday 8th October, 2014 29 / 44
  • 33. References References VII [22] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, pages 518–529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. [23] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors. In Proceedings of the thirtieth annual ACM symposium on Theory of computing - STOC ’98, pages 604–613, New York, New York, USA, 1998. ACM Press. [24] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, September 1999. [25] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence,, 33(1):117–128, 2011. E.S. Silva () Metric LSH Wednesday 8th October, 2014 30 / 44
  • 34. References References VIII [26] Herve Jegou, Laurent Amsaleg, Cordelia Schmid, and Patrick Gros. Query adaptative locality sensitive hashing. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 825–828. IEEE, March 2008. [27] Alexis Joly and Olivier Buisson. A posteriori multi-probe locality sensitive hashing. In Proceeding of the 16th ACM international conference on Multimedia - MM ’08, page 209, New York, New York, USA, 2008. ACM Press. [28] Byungkon Kang and Kyomin Jung. Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets. In NIPS Workshop on Big Learning (BigLearn), pages 1–8, Lake Tahoe, Nevada, 2012. [29] Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th edition, March 1990. E.S. Silva () Metric LSH Wednesday 8th October, 2014 31 / 44
  • 35. References References IX [30] Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97, pages 599–608, New York, NY, USA, 1997. ACM. [31] Martin Kruliš, Tomáš Skopal, Jakub Lokoˇc, and Christian Beecks. Combining cpu and gpu architectures for fast similarity search. Distributed and Parallel Databases, 30(3-4):179–207, 2012. [32] John Leech. Some sphere packings in higher space. Canadian Journal of Mathematics, 16:657–682, January 1964. [33] Herwig Lejsek, Fridrik Heidar Ásmundsson, Björn THór Jónsson, and Laurent Amsaleg. Efficient and effective image copyright enforcement. In BDA, 2005. E.S. Silva () Metric LSH Wednesday 8th October, 2014 32 / 44
  • 36. References References X [34] S. Liao, M.a. Lopez, and S.T. Leutenegger. High dimensional similarity search with space filling curves. In Proceedings 17th International Conference on Data Engineering, pages 615–622. IEEE Comput. Soc, 2001. [35] David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2):91–110, November 2004. [36] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 950–961. VLDB Endowment, 2007. E.S. Silva () Metric LSH Wednesday 8th October, 2014 33 / 44
  • 37. References References XI [37] Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 950–961. VLDB Endowment, 2007. [38] G. Mainar-Ruiz and J. Perez-Cortes. Approximate Nearest Neighbor Search using a Single Space-filling Curve and Multiple Representations of the Data Points. In 18th International Conference on Pattern Recognition (ICPR’06), pages 502–505. IEEE, 2006. [39] Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. In Proceedings of the twenty-second annual symposium on Computational geometry - SCG ’06, page 154, New York, New York, USA, 2006. ACM Press. E.S. Silva () Metric LSH Wednesday 8th October, 2014 34 / 44
  • 38. References References XII [40] RT Ng. CLARANS: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5):1003–1016, September 2002. [41] David Novak and Michal Batko. Metric Index: An Efficient and Scalable Solution for Similarity Search. In 2009 Second International Workshop on Similarity Search and Applications, pages 65–73. IEEE, August 2009. [42] David Novak, Martin Kyselak, and Pavel Zezula. On locality-sensitive indexing in generic metric spaces. Proceedings of the Third International Conference on SImilarity Search and APplications - SISAP ’10, page 59, 2010. [43] Alexander Ocsa and Elaine P M De Sousa. An Adaptive Multi-level Hashing Structure for Fast Approximate Similarity Search. Journal of Information and Data Management, 1(3):359–374, 2010. E.S. Silva () Metric LSH Wednesday 8th October, 2014 35 / 44
  • 39. References References XIII [44] Rafail Ostrovsky, Yuval Rabani, Leonard Schulman, and Chaitanya Swamy. The Effectiveness of Lloyd-Type Methods for the k-Means Problem. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), volume 59, pages 165–176. IEEE, December 2006. [45] Jia Pan and Dinesh Manocha. Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In 19th ACM SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems, GIS ’11. ACM, 2011. [46] Rina Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, SODA ’06, pages 1186–1195, New York, NY, USA, 2006. ACM. E.S. Silva () Metric LSH Wednesday 8th October, 2014 36 / 44
  • 40. References References XIV [47] Rina Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, SODA ’06, pages 1186–1195, New York, NY, USA, 2006. ACM. [48] Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36(2):3336–3341, 2009. [49] Adriano Arantes Paterlini, Mario A Nascimento, and Caetano Traina Junior. Using Pivots to Speed-Up k-Medoids Clustering. Journal of Information and Data Management, 2(2):221–236, June 2011. [50] Loïc Paulevé, Hervé Jégou, and Laurent Amsaleg. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 31(11):1348–1358, August 2010. E.S. Silva () Metric LSH Wednesday 8th October, 2014 37 / 44
  • 41. References References XV [51] D. Pollard. Quantization and the method of k-means. IEEE Transactions on Information Theory, 28(2):199–205, March 1982. [52] Hanan Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. [53] Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). The MIT Press, 2006. [54] James G. Shanahan, Sihem Amer-Yahia, Ioana Manolescu, Yi Zhang, David A. Evans, Aleksander Kolcz, Key-Sun Choi, and Abdur Chowdhury, editors. Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008. ACM, 2008. E.S. Silva () Metric LSH Wednesday 8th October, 2014 38 / 44
  • 42. References References XVI [55] Tomáš Skopal. Where are you heading, metric access methods?: a provocative survey. In Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP ’10, pages 13–21, New York, NY, USA, 2010. ACM. [56] Malcolm Slaney, Yury Lifshits, and Junfeng He. Optimal Parameters for Locality-Sensitive Hashing. Proceedings of the IEEE, 100(9):2604–2623, 2012. [57] Raisa Socorro, Luisa Micó, and Jose Oncina. A fast pivot-based indexing algorithm for metric spaces. Pattern Recognition Letters, 32(11):1511–1516, August 2011. [58] Aleksandar Stupar, Sebastian Michel, and Ralf Schenkel. RankReduce - processing K-Nearest Neighbor queries on top of MapReduce. In In LSDS-IR, 2010. E.S. Silva () Metric LSH Wednesday 8th October, 2014 39 / 44
  • 43. References References XVII [59] Eric Sadit Tellez and Edgar Chavez. On locality sensitive hashing in metric spaces. In Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP ’10, pages 67–74, New York, NY, USA, 2010. ACM. [60] George Teodoro, Daniel Fireman, Dorgival Guedes, Wagner Meira Jr., and Renato Ferreira. Achieving multi-level parallelism in the filter-labeled stream programming model. Parallel Processing, International Conference on, 0:287–294, 2008. [61] George Teodoro, Eduardo Valle, Nathan Mariano, Ricardo Torres, and Wagner Meira, Jr. Adaptive parallel approximate similarity search for responsive multimedia retrieval. In Proc. of the 20th ACM international conference on Information and knowledge management, CIKM ’11. ACM, 2011. E.S. Silva () Metric LSH Wednesday 8th October, 2014 40 / 44
  • 44. References References XVIII [62] A.J.M. Traina, A. Traina, C. Faloutsos, and B. Seeger. Fast indexing and visualization of metric data sets using slim-trees. Knowledge and Data Engineering, IEEE Transactions on, 14(2):244–260, 2002. [63] Caetano Traina, Jr., Agma J. M. Traina, Bernhard Seeger, and Christos Faloutsos. Slim-trees: High performance metric trees minimizing overlap between nodes. In Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’00, pages 51–65, London, UK, UK, 2000. Springer-Verlag. [64] Jeffrey K. Uhlmann. Satisfying general proximity / similarity queries with metric trees. Information Processing Letters, 40(4):175 – 179, 1991. E.S. Silva () Metric LSH Wednesday 8th October, 2014 41 / 44
  • 45. References References XIX [65] Eduardo Valle and Matthieu Cord. Advanced Techniques in CBIR: Local Descriptors, Visual Dictionaries and Bags of Features. In 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing, pages 72–78. IEEE, October 2009. [66] Eduardo Valle, Matthieu Cord, and Sylvie Philipp-Foliguet. High-dimensional descriptor indexing for large multimedia databases. In Shanahan et al. [54], pages 739–748. [67] Hongbo Xu. An Approximate Nearest Neighbor Query Algorithm Based on Hilbert Curve. In 2011 International Conference on Internet Computing and Information Services, pages 514–517. IEEE, September 2011. E.S. Silva () Metric LSH Wednesday 8th October, 2014 42 / 44
  • 46. References References XX [68] Peter N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, SODA ’93, pages 311–321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics. [69] Pavel Zezula. Future trends in similarity searching. In Proceedings of the 5th international conference on Similarity Search and Applications, SISAP’12, pages 8–24, Berlin, Heidelberg, 2012. Springer-Verlag. [70] Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. Kluwer Academic Publishers, Boston, 2006. E.S. Silva () Metric LSH Wednesday 8th October, 2014 43 / 44
  • 47. References References XXI [71] Pavel Zezula, Pasquale Savino, Giuseppe Amato, and Fausto Rabitti. Approximate similarity retrieval with m-trees. The VLDB Journal, 7(4):275–293, December 1998. [72] Qiaoping Zhang and Isabelle Couloigner. A new and efficient k-medoid algorithm for spatial clustering. In Osvaldo Gervasi, MarinaL. Gavrilova, Vipin Kumar, Antonio Laganà, HeowPueh Lee, Youngsong Mun, David Taniar, and ChihJengKenneth Tan, editors, Computational Science and Its Applications – ICCSA 2005, volume 3482 of Lecture Notes in Computer Science, pages 181–189. Springer Berlin Heidelberg, 2005. E.S. Silva () Metric LSH Wednesday 8th October, 2014 44 / 44