SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
Branch-and-bound nearest neighbor searching over
             unbalanced trie-structured overlays


                           Master’s Thesis Presentation
                           Technical University of Crete
                                     4.2.2013




Author:       Michail Argyriou
Supervisor:   Ass’t Prof. Vasilis Samoladas
P2P Evolution
2002




2001   2001               2001               2001   DHTs



2000                                         2000




1999   1999




1998
        Centralized       Semi-distributed   Fully-distributed   2
Distributed Hash Table (DHT)




                               3
DHT Frameworks Evolution
                • Rectangular queries support
                • Peers only on leaves
2003:   PGrid   • High-dimensional queries support with space filling curves




                • Height-balanced search tree limitation
2006:
         VBI
                • No height-balanced search tree limitation
                • Abstract types of data and queries
                  • Data: point, rectangular
2008:
        GRaSP     • Queries: point, 3-sided, n-d rectangular


                                                                               4
Nearest neighbor search




                          5
Given a distributed data set how can we
 find the k most similar data to a query?


     “k-Nearest Neighbor Search”



                                            6
Applications

                     Distributed
      GIS
                     Databases



  Statistical      Recommendation
 Classification        Systems



Cluster analysis   Similarity Scores
                                       7
Related Work
1. Naïve algorithm: Central peer collects data and
   performs k-NN searching
2. K-nn search algorithm over CAN
3. Distributed quad-based index  each quadtree
   block is uniquely identified by its centroid 
   mapped to Chord  k-NN search algorithm




                                                     8
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   9
GRaSP




        10
GRaSP
                      Building the trie ...
Hierarchical space partition:



        1       Peer p joins


            2     Finds a bootstrapping peer q


                Space region s(q) splits into s(q0) and
        3       s(q1)
                                                          11
GRaSP
Space Partition
              Volume-balanced




Before

                  Data-balanced




                                  13
Before
GRaSP
Space Partition for a 3-sided query




                                      14
GRaSP
Space Partition for a 3-sided query




                                      15
GRaSP
Space Partition for a 3-sided query




                                      16
GRaSP
                  Data Insertion



We insert a key k into all peers who own regions
                 that contain k




                                                   17
GRaSP
                           Routing Tables


    Each peer knows a peer in
each complementary subtrie ...




    0100 = 1
    0100 = 00
    0100 = 011
    0100 = 0101

                                            18
GRaSP
                             Routing

  “In order to route a message from peer p to peer q, the message is
forwarded from p to a neighbor peer included in a known subtrie closer
           to peer q. From r it is recursively forwarded to q.”




                                                                  19
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   20
Searching Algorithm
           Branch-and-bound algorithm



Priority queue PQ of candidate peers holding answer
  better than the k-th answer found so far  Fringe


            1. Branch Step: expand PQ
            2. Bound Step: prune PQ



                                                  21
Searching Algorithm
          Parallel Searching vs Iterative Searching




      Parallel Searching requires huge message state!

Iterative Searching prunes larger regions of the data space!




                                                               22
Searching Algorithm




                      23
Searching Algorithm
Branch-and-bound algorithm

             1?      d(q,s(1)) < d(q,a)
             00?     d(q,s(1)) > d(q,a)
             011?    d(q,s(1)) > d(q,a)
             0101?   d(q,s(1)) < d(q,a)




                                          24
Latency Complexity Theorem
                 Latency = |T|O(logn)




Support Set T:




                                        25
Latency Complexity Theorem
                                    Proof




Peers visited:


Peers in T:

                                            |T| peers


                 Find peer in the
                 complementary
                 subtrie: O(logn)



                                                        26
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   27
Performance Evaluation
Taking into account number of dimensions




Low          Medium               High



                                           28
Performance Evaluation
                   Metrics




•   Data Fairness Index
•   Latency
•   Max Throughput
•   Fringe Size (mean, max)



                               29
Low dimensions




Low   Medium       High


                          30
Low dimensions
                  Workloads

Datasets

• Greece, data-balanced partition,
  k=1/10/100
• Greece, volume-balanced partition, k=1

Querysets

• Synthetic queries
• For a network size of n peers we asked n/3
  queries
                                               31
Low dimensions
Which space partition is the best?




     Volume-          Data-
     balanced        balanced




                                     32
Low dimensions                            Data FI
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece ...




                                                                          33
             Data-balanced partition         Volume-balanced partition
Low dimensions                               Latency
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                          34
          Data-balanced partition            Volume-balanced partition
Low dimensions                              Fringe Size
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                          35
          Data-balanced partition            Volume-balanced partition
Low dimensions                           Max Throughput
                                                             vs
                                                       Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                         36
          Data-balanced partition            Volume-balanced partition
Low dimensions
Which space partition is the best?

 Volume-               Data-
 balanced             balanced




                                     37
Low dimensions
      k?




                 38
Low dimensions
                                                Fringe Size
                How is the size of the fringe       vs
                         affected?                   k




Greece, data-balanced partition ...




                                                              39
             k=1                      k=10          k=100
Low dimensions                Latency
                   How is the latency affected?      vs
                                                     k




Greece, data-balanced partition ...




                                                            40
             k=1                       k=10         k=100
Low dimensions                Max Throughput
           How is the Max. Throughput affected?         vs
                                                        k




Greece, data-balanced partition ...




                                                                   41
             k=1                      k=10              k=100
Low dimensions … efficient routing!




                                  42
Medium dimensions




Low    Medium        High


                            43
Medium dimensions
                 Workloads

Datasets

• Uniform, volume-balanced partition, k=1
• ColorMoments, data-balanced partition,
  k=1

Querysets

• Synthetic queries
• For a network size of n peers we asked
  n/3 queries
                                            44
Medium dimensions
How is the size of the fringe
         affected?




                                45
Medium dimensions
How is the size of the fringe
         affected?




                                       46
    ColorMoments, data-balanced, k=1
Medium dimensions
         How is the size of the fringe
                  affected?

             Uniform, volume-balanced, k=1




Mean Fringe Size                             Max. Fringe Size
                                                                47
Medium dimensions
   Data Fairness Index




                         48
Medium dimensions
   Data Fairness Index




                                     49
  ColorMoments, data-balanced, k=1
Medium dimensions
   Data Fairness Index




                                  50
  Uniform, volume-balanced, k=1
Medium dimensions
      Latency




                    51
Medium dimensions
          Latency




                                     52
  ColorMoments, data-balanced, k=1
Medium dimensions
           Latency




                                  53
  Uniform, volume-balanced, k=1
Medium dimensions
                 Latency



Latency is high but near to the optimum!




                                      54
Medium dimensions
   Max. Throughput




                     55
Medium dimensions
    Max. Throughput




                                     56
  ColorMoments, data-balanced, k=1
Medium dimensions
     Max. Throughput




                                  57
  Uniform, volume-balanced, k=1
Medium dimensions … not efficient
    routing but near optimum!

It's still good enough for practical
             applications!


                                     58
High dimensions




Low    Medium       High


                           59
High dimensions
                 Curse of dimensionality




          “When the dimensionality increases,
                 the volume of the space
increases so fast that the available data becomes sparse.”




                                                       60
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   61
Conclusions

                API

Searching                    Data
                 Trie
 (k-NN)                    Ins/Rem

 Query                      Space
             Data Types
 Types                     Partition

            Metric Space
                                       62
Future Work
 Approximate k-NN
 searching for high
    dimensions




Redundancy


                      63
THANK YOU
 QUESTIONS ?




               64

Mais conteúdo relacionado

Mais procurados

ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
Takuma Wakamori
 

Mais procurados (20)

D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
WaveNet
WaveNetWaveNet
WaveNet
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 

Semelhante a Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
butest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
butest
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
KamleshKumar394
 

Semelhante a Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays (20)

An introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphsAn introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphs
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Bichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest NeighboursBichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest Neighbours
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Dp idp exploredb
Dp idp exploredbDp idp exploredb
Dp idp exploredb
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
A fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief ProblemA fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief Problem
 
Design and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithmsDesign and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithms
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
Spectral cnn
Spectral cnnSpectral cnn
Spectral cnn
 
DDBMS
DDBMSDDBMS
DDBMS
 

Último

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

  • 1. Branch-and-bound nearest neighbor searching over unbalanced trie-structured overlays Master’s Thesis Presentation Technical University of Crete 4.2.2013 Author: Michail Argyriou Supervisor: Ass’t Prof. Vasilis Samoladas
  • 2. P2P Evolution 2002 2001 2001 2001 2001 DHTs 2000 2000 1999 1999 1998 Centralized Semi-distributed Fully-distributed 2
  • 4. DHT Frameworks Evolution • Rectangular queries support • Peers only on leaves 2003: PGrid • High-dimensional queries support with space filling curves • Height-balanced search tree limitation 2006: VBI • No height-balanced search tree limitation • Abstract types of data and queries • Data: point, rectangular 2008: GRaSP • Queries: point, 3-sided, n-d rectangular 4
  • 6. Given a distributed data set how can we find the k most similar data to a query? “k-Nearest Neighbor Search” 6
  • 7. Applications Distributed GIS Databases Statistical Recommendation Classification Systems Cluster analysis Similarity Scores 7
  • 8. Related Work 1. Naïve algorithm: Central peer collects data and performs k-NN searching 2. K-nn search algorithm over CAN 3. Distributed quad-based index  each quadtree block is uniquely identified by its centroid  mapped to Chord  k-NN search algorithm 8
  • 9. Contents GRaSP k-NN Evaluation Conclusions 9
  • 10. GRaSP 10
  • 11. GRaSP Building the trie ... Hierarchical space partition: 1 Peer p joins 2 Finds a bootstrapping peer q Space region s(q) splits into s(q0) and 3 s(q1) 11
  • 12. GRaSP Space Partition Volume-balanced Before Data-balanced 13 Before
  • 13. GRaSP Space Partition for a 3-sided query 14
  • 14. GRaSP Space Partition for a 3-sided query 15
  • 15. GRaSP Space Partition for a 3-sided query 16
  • 16. GRaSP Data Insertion We insert a key k into all peers who own regions that contain k 17
  • 17. GRaSP Routing Tables Each peer knows a peer in each complementary subtrie ... 0100 = 1 0100 = 00 0100 = 011 0100 = 0101 18
  • 18. GRaSP Routing “In order to route a message from peer p to peer q, the message is forwarded from p to a neighbor peer included in a known subtrie closer to peer q. From r it is recursively forwarded to q.” 19
  • 19. Contents GRaSP k-NN Evaluation Conclusions 20
  • 20. Searching Algorithm Branch-and-bound algorithm Priority queue PQ of candidate peers holding answer better than the k-th answer found so far  Fringe 1. Branch Step: expand PQ 2. Bound Step: prune PQ 21
  • 21. Searching Algorithm Parallel Searching vs Iterative Searching Parallel Searching requires huge message state! Iterative Searching prunes larger regions of the data space! 22
  • 23. Searching Algorithm Branch-and-bound algorithm 1? d(q,s(1)) < d(q,a) 00? d(q,s(1)) > d(q,a) 011? d(q,s(1)) > d(q,a) 0101? d(q,s(1)) < d(q,a) 24
  • 24. Latency Complexity Theorem Latency = |T|O(logn) Support Set T: 25
  • 25. Latency Complexity Theorem Proof Peers visited: Peers in T: |T| peers Find peer in the complementary subtrie: O(logn) 26
  • 26. Contents GRaSP k-NN Evaluation Conclusions 27
  • 27. Performance Evaluation Taking into account number of dimensions Low Medium High 28
  • 28. Performance Evaluation Metrics • Data Fairness Index • Latency • Max Throughput • Fringe Size (mean, max) 29
  • 29. Low dimensions Low Medium High 30
  • 30. Low dimensions Workloads Datasets • Greece, data-balanced partition, k=1/10/100 • Greece, volume-balanced partition, k=1 Querysets • Synthetic queries • For a network size of n peers we asked n/3 queries 31
  • 31. Low dimensions Which space partition is the best? Volume- Data- balanced balanced 32
  • 32. Low dimensions Data FI vs Space Partition Which space partition is the best? Greece ... 33 Data-balanced partition Volume-balanced partition
  • 33. Low dimensions Latency vs Space Partition Which space partition is the best? Greece, k=1 ... 34 Data-balanced partition Volume-balanced partition
  • 34. Low dimensions Fringe Size vs Space Partition Which space partition is the best? Greece, k=1 ... 35 Data-balanced partition Volume-balanced partition
  • 35. Low dimensions Max Throughput vs Space Partition Which space partition is the best? Greece, k=1 ... 36 Data-balanced partition Volume-balanced partition
  • 36. Low dimensions Which space partition is the best? Volume- Data- balanced balanced 37
  • 37. Low dimensions k? 38
  • 38. Low dimensions Fringe Size How is the size of the fringe vs affected? k Greece, data-balanced partition ... 39 k=1 k=10 k=100
  • 39. Low dimensions Latency How is the latency affected? vs k Greece, data-balanced partition ... 40 k=1 k=10 k=100
  • 40. Low dimensions Max Throughput How is the Max. Throughput affected? vs k Greece, data-balanced partition ... 41 k=1 k=10 k=100
  • 41. Low dimensions … efficient routing! 42
  • 42. Medium dimensions Low Medium High 43
  • 43. Medium dimensions Workloads Datasets • Uniform, volume-balanced partition, k=1 • ColorMoments, data-balanced partition, k=1 Querysets • Synthetic queries • For a network size of n peers we asked n/3 queries 44
  • 44. Medium dimensions How is the size of the fringe affected? 45
  • 45. Medium dimensions How is the size of the fringe affected? 46 ColorMoments, data-balanced, k=1
  • 46. Medium dimensions How is the size of the fringe affected? Uniform, volume-balanced, k=1 Mean Fringe Size Max. Fringe Size 47
  • 47. Medium dimensions Data Fairness Index 48
  • 48. Medium dimensions Data Fairness Index 49 ColorMoments, data-balanced, k=1
  • 49. Medium dimensions Data Fairness Index 50 Uniform, volume-balanced, k=1
  • 50. Medium dimensions Latency 51
  • 51. Medium dimensions Latency 52 ColorMoments, data-balanced, k=1
  • 52. Medium dimensions Latency 53 Uniform, volume-balanced, k=1
  • 53. Medium dimensions Latency Latency is high but near to the optimum! 54
  • 54. Medium dimensions Max. Throughput 55
  • 55. Medium dimensions Max. Throughput 56 ColorMoments, data-balanced, k=1
  • 56. Medium dimensions Max. Throughput 57 Uniform, volume-balanced, k=1
  • 57. Medium dimensions … not efficient routing but near optimum! It's still good enough for practical applications! 58
  • 58. High dimensions Low Medium High 59
  • 59. High dimensions Curse of dimensionality “When the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse.” 60
  • 60. Contents GRaSP k-NN Evaluation Conclusions 61
  • 61. Conclusions API Searching Data Trie (k-NN) Ins/Rem Query Space Data Types Types Partition Metric Space 62
  • 62. Future Work Approximate k-NN searching for high dimensions Redundancy 63