SlideShare uma empresa Scribd logo
1 de 26
A Scalable Tridiagonal Solver    For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and  technology of Hunan University
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is a tridiagonal system?
What is it used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language  CUDA Cyclic reduction Cyclic reduction 2006 2007
A Classic Serial Algorithm ,[object Object],Phase 1:Forword Reduction Phase 2:Backward Substitution Elimination steps? Complexity? 2n-1 O(n)=2(n-1)+1
Parallel Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A set of equations mapped to one thread A single equation mapped to one thread
Cyclic Reduction 2-4  threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4  threads working log 2 (8)=3 steps
Advantages of Previous Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hybird Algorithm ,[object Object],[object Object],One 8-unkown system One PCR step Parallel Thomas
GPU Implementation ,[object Object],[object Object],[object Object],[object Object]
Tiled PCR ,[object Object],Redundancy of  naive tiling  of PCR ,[object Object],[object Object],[object Object]
Dependency & Parallelism How to Reduce Redundancy? ,[object Object],[object Object],Solution 1 Redundancy is also exist!
Dependency & Parallelism cont Fine-grained tiling ,[object Object],[object Object],Solution 2 Without redundancy Sequential   Computation
Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate   results  are cached 2.Each tile are processed  parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed  sequentially  using cache
Components of Buffered Sliding Window ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example
Advantages of TPCR ,[object Object],[object Object],[object Object],[object Object]
Thread-level Parallel  Thomas Algorithm ,[object Object],[object Object],64B aligned segment 128B aligned segment
Performance Evaluation Test-Platform ,[object Object],[object Object],[object Object],[object Object]
Performance Results Parameter  M  and  N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
Performance Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object]
Reference ,[object Object],[object Object],[object Object],[object Object]
Question? Thanks

Mais conteúdo relacionado

Mais procurados

Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Lionel Briand
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
Hajime Tazaki
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
Yan Drugalya
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
Hajime Tazaki
 

Mais procurados (20)

Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric Into
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Nicpaper2009
Nicpaper2009Nicpaper2009
Nicpaper2009
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
 
Introduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsIntroduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious Algorithms
 
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Ch5 answers
Ch5 answersCh5 answers
Ch5 answers
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance Computing
 

Destaque (15)

Ch026
Ch026Ch026
Ch026
 
Linked in series b pitch
Linked in series b pitchLinked in series b pitch
Linked in series b pitch
 
Pisa sokk
Pisa sokkPisa sokk
Pisa sokk
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Mantas of maldives part 1
Mantas of maldives part 1Mantas of maldives part 1
Mantas of maldives part 1
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Ensayo dominio público
Ensayo dominio públicoEnsayo dominio público
Ensayo dominio público
 
Deadgirl_horror film
Deadgirl_horror filmDeadgirl_horror film
Deadgirl_horror film
 
Előadás
ElőadásElőadás
Előadás
 
úJ nemzedék
úJ nemzedékúJ nemzedék
úJ nemzedék
 
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
 
Látlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőlLátlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységről
 
Manual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitatManual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitat
 
Iskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőlIskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésről
 
Totyik tbemutatóóra
Totyik tbemutatóóraTotyik tbemutatóóra
Totyik tbemutatóóra
 

Semelhante a Tridiagonal solver in gpu

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
Cheng-Hsuan Li
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
Jinho Lee
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
wjunjmt
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
Obsidian Software
 

Semelhante a Tridiagonal solver in gpu (20)

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Packet sniffing
Packet sniffingPacket sniffing
Packet sniffing
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
4g lte matlab
4g lte matlab4g lte matlab
4g lte matlab
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Tridiagonal solver in gpu

  • 1. A Scalable Tridiagonal Solver For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and technology of Hunan University
  • 2.
  • 3. What is a tridiagonal system?
  • 4.
  • 5. Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language CUDA Cyclic reduction Cyclic reduction 2006 2007
  • 6.
  • 7.
  • 8. Cyclic Reduction 2-4 threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
  • 9. Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4 threads working log 2 (8)=3 steps
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate results are cached 2.Each tile are processed parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed sequentially using cache
  • 17.
  • 19.
  • 20.
  • 21.
  • 22. Performance Results Parameter M and N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
  • 23.
  • 24.
  • 25.