SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Parallel Combinatorial Computing and Sparse
                  Matrices

         E. Jason Riedy         James Demmel

                Computer Science Division
                    EECS Department
              University of California, Berkeley


   SIAM Conference on Computational Science and
                Engineering, 2005
Outline


    Fundamental question: What performance metrics are right?


   Background

   Algorithms
      Sparse Transpose
      Weighted Bipartite Matching

   Setting Performance Goals

   Ordering Ideas
App: Sparse LU Factorization




                  Characteristics:
                      Large quantities of numerical work.
                      Eats memory and flops.
                      Benefits from parallel work.
                      And needs combinatorial support.
App: Sparse LU Factorization




                  Characteristics:
                      Large quantities of numerical work.
                      Eats memory and flops.
                      Benefits from parallel work.
                      And needs combinatorial support.
App: Sparse LU Factorization



                  Combinatorial support:
                     Fill-reducing orderings, pivot
                     avoidance, data structures.
                     Numerical work is distributed.
                     Supporting algorithms need to be
                     distributed.
                     Memory may be cheap ($100 GB),
                     moving data is costly.
Sparse Transpose


  Data structure manipulation
      Dense transpose moves numbers, sparse moves numbers
      and re-indexes them.
      Sequentially space-efficient “algorithms” exist, but disagree
      with most processors.
          Chains of data-dependent loads
          Unpredictable memory patterns


     If the data is already distributed, an unoptimized parallel
       transpose is better than an optimized sequential one!
Parallel Sparse Transpose

                      Many, many options:
    P4         P3
                           Send to root, transpose, distribute.
                           Transpose, send pieces to destinations.
                           Transpose, then rotate data.
    P1         P2          Replicate the matrix, transpose
                           everywhere.


  Communicates most of matrix twice. Node stores whole matrix.

  Note: We should compare with this implementation, not purely
  sequential.
Parallel Sparse Transpose

                       Many, many options:
    P4         P3
                            Send to root, transpose, distribute.
                            Transpose, send pieces to destinations.
                            Transpose, then rotate data.
    P1         P2           Replicate the matrix, transpose
                            everywhere.


  All-to-all communication. Some parallel work.
Parallel Sparse Transpose

                       Many, many options:
    P4         P3
                           Send to root, transpose, distribute.
                           Transpose, send pieces to destinations.
                           Transpose, then rotate data.
    P1         P2          Replicate the matrix, transpose
                           everywhere.


  Serial communication, but may hide latencies.
Parallel Sparse Transpose

                      Many, many options:
    P4         P3
                          Send to root, transpose, distribute.
                          Transpose, send pieces to destinations.
                          Transpose, then rotate data.
    P1         P2         Replicate the matrix, transpose
                          everywhere.


  Useful in some circumstances.
What Data Is Interesting?




      Time
What Data Is Interesting?




      Time (to solution)
      How much data is communicated.
      Overhead and latency.
      Quantity of data resident on a processor.
Parallel Transpose Performance
                                       Transpose                                                                        Transpose v. root


                                                                      q
                                                                      q                                                                   q
                                                                                                                                          q
                                                                                                                                          q
                                                                                                                                          q
                                                                                                                                          q
                                                                                                                                          q
                                                   q              q
                                                                  q
                1.5                                                   q
                                                   q
                                                   q                                                6
                                                   q

                                                                  q
                                                                                                                                          q              q
                                                                                                                                                         q
                                                   q
                                                   q                                                                                      q
                                                                                                                                          q




                                                                               Re−scaled speed−up
                                                                                                                                          q
                                                   q                                                                                      q
                                                                                                                                          q          q
                                                                                                                                                     q
                                                   q              q                                                                       q
                                                                                                                                          q
                                                   q                                                                                      q
                                                                                                                                          q              q
                                                                                                                                                         q
                                                   q                                                                         q            q
                                                                                                                             q            q
                                        q
     Speed−up




                                                                                                                             q
                                                                                                                             q                           q
                                        q          q                                                                         q            q
                1.0                     q
                                        q
                                                   q
                                                   q
                                                                                                                             q
                                                                                                                             q
                                                                                                                                          q
                                                                                                                                          q
                                        q          q              q                                 4
                                                                  q
                                                                  q                                                                       q
                                                                                                                             q
                                                                                                                             q            q
                                                                                                                                          q
                                                                  q                                                          q
                                                                                                                             q            q          q
                                        q
                                        q                                                                                    q            q
                                                                                                                                          q
                                                                                                                             q
                                                                                                                             q            q
                                                                                                                                          q          q
                                        q                                                                                    q            q
                                                                                                                                          q
                                                   q
                                                   q
                                                   q                                                                         q
                                                                                                                             q
                                                                                                                             q            q
                                                                                                                                          q
                                        q          q                                                                         q            q
                                                                                                                                          q
                                                                                                                                          q          q
                                                                                                                                                     q
                                        q
                                        q          q
                                                   q                  q                                                      q            q
                                                                                                                                          q
                                                   q                                                                                      q
                                                                                                                                          q          q
                                                                                                                                                     q
                                                                  q
                                                                  q   q                                                      q            q          q
                                                                                                                                                     q
                                                                                                                                                     q
                                                   q              q   q                                                      q            q          q
                                                   q                      q                                                               q
                                                                                                                                          q          q
                              q
                              q         q          q
                                                   q              q                                                          q
                                                                                                                             q            q
                                                                                                                                          q          q
                                                                                                                                                     q
                              q         q
                                        q          q
                                                   q              q                                             q            q
                                                                                                                             q
                                                                                                                             q            q          q
                              q
                              q         q
                                        q          q              q
                                                                  q                                             q
                                                                                                                q            q                       q
                                        q                         q
                                                                  q                                             q
                                                                                                                q
                                                                                                                q            q
                                                                                                                             q
                0.5                     q          q              q   q                                         q
                                                                                                                q            q
                                                                                                                             q            q
                                                                                                                                          q          q
                                        q          q              q                                             q            q            q
                              q
                              q
                              q         q
                                        q
                                                                  q
                                                                  q
                                                                  q
                                                                          q                         2           q
                                                                                                                q
                                                                                                                q
                                                                                                                q
                                                                                                                             q
                                                                                                                             q
                                                                                                                                          q
                                                                                                                                          q
                                                                                                                                          q
                                                                                                                                          q
                              q
                              q         q
                                        q          q              q                                             q            q
                                                                                                                             q            q
                                                                                                                                          q          q
                                        q          q
                                                   q                                                            q
                                                                                                                q            q            q          q
                          q   q
                              q         q
                                        q          q                                                            q
                                                                                                                q            q
                                                                                                                             q                       q
                          q   q                    q              q                                             q
                                                                                                                q            q
                                                                                                                             q                       q
                          q
                          q             q                         q                                             q
                                                                                                                q            q
                                                                                                                             q                       q
                              q
                              q         q                         q
                                                                  q                                             q            q
                                                                                                                             q            q          q
                          q   q
                              q         q          q
                                                   q              q                                             q            q            q          q
                          q
                          q             q
                                        q                                                                   q   q
                                                                                                                q            q
                                                                                                                             q            q          q
                      q   q   q
                              q         q          q                                                        q
                                                                                                            q
                                                                                                            q   q
                                                                                                                q            q
                                                                                                                             q            q
                                                                                                                                          q          q
                          q
                          q   q         q
                                        q
                                        q          q
                                                   q              q                                         q
                                                                                                            q   q            q            q
                                                                                                                                          q
                      q
                      q   q
                          q   q         q                                                               q   q
                                                                                                            q   q
                                                                                                                q            q
                                                                                                                             q            q
                                                                                                                                          q
                          q   q
                              q         q          q              q       q                             q
                                                                                                        q   q
                                                                                                            q   q
                                                                                                                q            q            q
                                                                                                                                          q          q
                                                                                                                                                     q
                      q
                      q
                      q   q             q          q                      q                             q
                                                                                                        q   q   q            q
                      q   q   q         q                                                               q
                                                                                                        q   q
                      q                 q          q                                                    q
                                                                  q                                         q
                                                                                                            q
                                                                      q   q
                0.0       q                                                                         0       q




                                  5          10              15           20                                        5              10           15
                                      Number of Processors                                                               Number of Processors




          All-to-all is slower than pure sequential code, but
          distributed.
          Actual speed-up when the data is already distributed.
          Hard to keep constant size / node when performance
          varies by problem.
                                                  Data from CITRIS cluster, Itanium2s with Myrinet.
Weighted Bipartite Matching



                      Not moving data around but finding
                      where it should go.
                      Find the “best” edges in a bipartite
                      graph.
                      Corresponds to picking the “best”
                      diagonal.
                      Used for static pivoting in factorization.
                      Also in travelling salesman problems,
                      etc.
Weighted Bipartite Matching



                      Not moving data around but finding
                      where it should go.
                      Find the “best” edges in a bipartite
                      graph.
                      Corresponds to picking the “best”
                      diagonal.
                      Used for static pivoting in factorization.
                      Also in travelling salesman problems,
                      etc.
Algorithms


  Depth-first search
      Reliable performance, code available (MC 64)
      Requires A and AT .
      Difficult to compute on distributed data.

  Interior point
      Performance varies wildly; many tuning parameters.
      Full generality: Solve larger sparse system.
      Auction algorithms replace solve with iterative bidding.
      Easy to distribute.
Parallel Auction Performance
                                                                                                                     q
                                     Auction                                                                  Auction v. root
                                                                                                                                     q

                                                                                                                                     q




                                                                                                      q



                                                                                              q
              8   q                                                                       8   q
                  q




                                                                      Re−scaled speedup
              6                                                                           6
    Speedup




                                                                                                      q              q
                                                                                                  q
                                                                                              q   q
                                                                                              q       q
                                                                                                                     q
                                                                                                      q
                                                                                                  q   q
                                                                                              q       q
              4       q
                          q
                                                                                          4       q
                                                                                                  q
                      q
                                                                                              q   q
                  q   q                                                                       q   q                                  q
                                                                                                                     q                        q
                                                                                              q
                  q       q                                                                           q
                                                                                              q   q                  q
                  q
                  q   q                                                                       q                      q               q
                  q                      q                                                    q   q
                      q                                                                       q   q
                                                                                                  q                                           q
              2   q
                  q   q                                                                   2   q
                                                                                              q       q
                      q   q                              q                                                                           q
                  q
                  q       q                                                                   q   q   q                              q
                  q       q
                      q                  q               q                                        q
                      q                                                                                                              q
                      q                                                                               q              q
                  q   q   q
                          q                                                                   q   q                                  q
                                         q
                                         q                                                        q
                  q   q                  q               q                                            q              q
                          q
                          q              q               q        q                                   q
                  q                                                                                                  q
                                                                                                                     q
                          q              q               q                                        q   q
                      q   q                                       q                                   q              q               q        q
                          q              q               q
                                                         q        q                                                  q               q        q
                          q
                          q              q
                                         q               q        q                                   q
                                                                                                      q              q
                                                                                                                     q               q
                                                                                                                                     q
                                                         q                                                                           q

                              5                 10           15                                           5                 10           15
                                  Number of Processors                                                        Number of Processors




   Compare with running an auction on the root, a parallel auction
                    achieves slight speed-up.
Proposed Performance Goals



  When is a distributed combinatorial algorithm (or code)
  successful?
      Does not redistribute the data excessively.
      Keeps the data distributed.
      No process sees more than the whole.
      Performance is competitive with the on-root option.


    Pure speed-up is a great goal, but not always reasonable.
Distributed Matrix Ordering



     Finding a permutation of columns and rows to reduce fill.




            NP-hard to solve, difficult to approximate.
Sequential Orderings

  Bottom-up
      Pick columns (and possibly rows) in sequence.
      Heuristic choices:
          Minimum degree, deficiency, approximations
      Maintain symbolic factorization

  Top-down
      Separate the matrix into multiple sections.
          Graph partitioning: A + AT , AT · A, A
      Needs vertex separators: Difficult.

  Top-down Hybrid
      Dissect until small, then order.
Parallel Bottom-up Hybrid Order




                    1. Separate the graph into chunks.
                           Needs an edge separator,
                           and knowledge of the pivot.
                    2. Order each chunk separately.
                           Forms local partial orders.
                    3. Merge the orders.
                           What needs communicated?
Merging Partial Orders



                   Respecting partial orders
                         Local, symbolic factorization done
                         once.
                         Only need to communicate quotient
                         graph.
                             Quotient graph: Implicit edges for
                             Schur complements.
                         No node will communicate more than
                         the whole matrix.
Preliminary Quality Results

   Merging heuristic
       Pairwise merge.
       Pick the head pivot with least worst-case fill (Markowitz
       cost).
   Small (tiny) matrices: performance not reliable.

              Matrix      Method           NNZ increase
              west2021    AMD (A + AT )           1.51×
                          merging                 1.68×
              orani678   AMD                      2.37×
                         merging                  6.11×
    Increasing the numerical work drastically spends any savings
     from computing a distributed order. Need better heuristics?
Summary

     Meeting classical expectations of scaling is difficult.
          Relatively small amounts of computation for much
          communication.
          Problem-dependent performance makes equi-size scaling
          hard.
     But consolidation costs when data is already distributed.

         In a supporting role, don’t sweat the speed-up.
                 Keep the problem distributed.


  Open topics
     Any new ideas for parallel ordering?

Mais conteúdo relacionado

Semelhante a Parallel Combinatorial Computing and Sparse Matrices

boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
finance28
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
finance28
 
Time series compare
Time series compareTime series compare
Time series compare
lrhutyra
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
Colin Gillespie
 
Example sweavefunnelplot
Example sweavefunnelplotExample sweavefunnelplot
Example sweavefunnelplot
Tony Hirst
 

Semelhante a Parallel Combinatorial Computing and Sparse Matrices (12)

boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
 
Slides binar-hec
Slides binar-hecSlides binar-hec
Slides binar-hec
 
Slides mcneil
Slides mcneilSlides mcneil
Slides mcneil
 
Living In The Cloud
Living In The CloudLiving In The Cloud
Living In The Cloud
 
Slides alexander-mcneil
Slides alexander-mcneilSlides alexander-mcneil
Slides alexander-mcneil
 
Portuguese Market and On-board Sampling Effort Review
Portuguese Market and On-board Sampling Effort ReviewPortuguese Market and On-board Sampling Effort Review
Portuguese Market and On-board Sampling Effort Review
 
Slides lyon-2011
Slides lyon-2011Slides lyon-2011
Slides lyon-2011
 
Time series compare
Time series compareTime series compare
Time series compare
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling Overview
 
Example sweavefunnelplot
Example sweavefunnelplotExample sweavefunnelplot
Example sweavefunnelplot
 

Mais de Jason Riedy

Mais de Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Parallel Combinatorial Computing and Sparse Matrices

  • 1. Parallel Combinatorial Computing and Sparse Matrices E. Jason Riedy James Demmel Computer Science Division EECS Department University of California, Berkeley SIAM Conference on Computational Science and Engineering, 2005
  • 2. Outline Fundamental question: What performance metrics are right? Background Algorithms Sparse Transpose Weighted Bipartite Matching Setting Performance Goals Ordering Ideas
  • 3. App: Sparse LU Factorization Characteristics: Large quantities of numerical work. Eats memory and flops. Benefits from parallel work. And needs combinatorial support.
  • 4. App: Sparse LU Factorization Characteristics: Large quantities of numerical work. Eats memory and flops. Benefits from parallel work. And needs combinatorial support.
  • 5. App: Sparse LU Factorization Combinatorial support: Fill-reducing orderings, pivot avoidance, data structures. Numerical work is distributed. Supporting algorithms need to be distributed. Memory may be cheap ($100 GB), moving data is costly.
  • 6. Sparse Transpose Data structure manipulation Dense transpose moves numbers, sparse moves numbers and re-indexes them. Sequentially space-efficient “algorithms” exist, but disagree with most processors. Chains of data-dependent loads Unpredictable memory patterns If the data is already distributed, an unoptimized parallel transpose is better than an optimized sequential one!
  • 7. Parallel Sparse Transpose Many, many options: P4 P3 Send to root, transpose, distribute. Transpose, send pieces to destinations. Transpose, then rotate data. P1 P2 Replicate the matrix, transpose everywhere. Communicates most of matrix twice. Node stores whole matrix. Note: We should compare with this implementation, not purely sequential.
  • 8. Parallel Sparse Transpose Many, many options: P4 P3 Send to root, transpose, distribute. Transpose, send pieces to destinations. Transpose, then rotate data. P1 P2 Replicate the matrix, transpose everywhere. All-to-all communication. Some parallel work.
  • 9. Parallel Sparse Transpose Many, many options: P4 P3 Send to root, transpose, distribute. Transpose, send pieces to destinations. Transpose, then rotate data. P1 P2 Replicate the matrix, transpose everywhere. Serial communication, but may hide latencies.
  • 10. Parallel Sparse Transpose Many, many options: P4 P3 Send to root, transpose, distribute. Transpose, send pieces to destinations. Transpose, then rotate data. P1 P2 Replicate the matrix, transpose everywhere. Useful in some circumstances.
  • 11. What Data Is Interesting? Time
  • 12. What Data Is Interesting? Time (to solution) How much data is communicated. Overhead and latency. Quantity of data resident on a processor.
  • 13. Parallel Transpose Performance Transpose Transpose v. root q q q q q q q q q q q 1.5 q q q 6 q q q q q q q q q Re−scaled speed−up q q q q q q q q q q q q q q q q q q q q q Speed−up q q q q q q q 1.0 q q q q q q q q q q q 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.5 q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0 q 0 q 5 10 15 20 5 10 15 Number of Processors Number of Processors All-to-all is slower than pure sequential code, but distributed. Actual speed-up when the data is already distributed. Hard to keep constant size / node when performance varies by problem. Data from CITRIS cluster, Itanium2s with Myrinet.
  • 14. Weighted Bipartite Matching Not moving data around but finding where it should go. Find the “best” edges in a bipartite graph. Corresponds to picking the “best” diagonal. Used for static pivoting in factorization. Also in travelling salesman problems, etc.
  • 15. Weighted Bipartite Matching Not moving data around but finding where it should go. Find the “best” edges in a bipartite graph. Corresponds to picking the “best” diagonal. Used for static pivoting in factorization. Also in travelling salesman problems, etc.
  • 16. Algorithms Depth-first search Reliable performance, code available (MC 64) Requires A and AT . Difficult to compute on distributed data. Interior point Performance varies wildly; many tuning parameters. Full generality: Solve larger sparse system. Auction algorithms replace solve with iterative bidding. Easy to distribute.
  • 17. Parallel Auction Performance q Auction Auction v. root q q q q 8 q 8 q q Re−scaled speedup 6 6 Speedup q q q q q q q q q q q q q 4 q q 4 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2 q q q 2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 5 10 15 5 10 15 Number of Processors Number of Processors Compare with running an auction on the root, a parallel auction achieves slight speed-up.
  • 18. Proposed Performance Goals When is a distributed combinatorial algorithm (or code) successful? Does not redistribute the data excessively. Keeps the data distributed. No process sees more than the whole. Performance is competitive with the on-root option. Pure speed-up is a great goal, but not always reasonable.
  • 19. Distributed Matrix Ordering Finding a permutation of columns and rows to reduce fill. NP-hard to solve, difficult to approximate.
  • 20. Sequential Orderings Bottom-up Pick columns (and possibly rows) in sequence. Heuristic choices: Minimum degree, deficiency, approximations Maintain symbolic factorization Top-down Separate the matrix into multiple sections. Graph partitioning: A + AT , AT · A, A Needs vertex separators: Difficult. Top-down Hybrid Dissect until small, then order.
  • 21. Parallel Bottom-up Hybrid Order 1. Separate the graph into chunks. Needs an edge separator, and knowledge of the pivot. 2. Order each chunk separately. Forms local partial orders. 3. Merge the orders. What needs communicated?
  • 22. Merging Partial Orders Respecting partial orders Local, symbolic factorization done once. Only need to communicate quotient graph. Quotient graph: Implicit edges for Schur complements. No node will communicate more than the whole matrix.
  • 23. Preliminary Quality Results Merging heuristic Pairwise merge. Pick the head pivot with least worst-case fill (Markowitz cost). Small (tiny) matrices: performance not reliable. Matrix Method NNZ increase west2021 AMD (A + AT ) 1.51× merging 1.68× orani678 AMD 2.37× merging 6.11× Increasing the numerical work drastically spends any savings from computing a distributed order. Need better heuristics?
  • 24. Summary Meeting classical expectations of scaling is difficult. Relatively small amounts of computation for much communication. Problem-dependent performance makes equi-size scaling hard. But consolidation costs when data is already distributed. In a supporting role, don’t sweat the speed-up. Keep the problem distributed. Open topics Any new ideas for parallel ordering?