SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Programming with Linux on the 
         Playstation3
                              FOSDEM 2008
                         olivier.grisel@ensta.org


               
                   Architecture overview:  
                   introducing the Cell BE 
               
                   Installing Linux
               
                   SIMD programming in C/C++
               
                   Asynchronous data transfer with 
                   the DMA




           
Who am I

    Java / Python developer at Nuxeo (FOSS document 
    management server)

    Interested in Artificial Intelligence (and need fast 
    Support Vector Machines)

    Slides to be published at:
    http://oliviergrisel.name




                         
PS3 architecture overview

    CPU: IBM Cell/BE @ 3.2GHz 
    
        218 GFLOPS
    
        Main RAM: 256MB XDR (64b@3.2GHz)

    GPU: Nvidia RSX
    
         1.8 TFLOPS (SP) / 356 GFLOPS programmable 
    
        VRAM: 256MB GDDR3 (2x128b@700MHz)

    System Bus: 2.5 GB/s


                         
The Cell Broadband Engine
             
                 1 PPE core @ 3.2GHz
                 
                     64bit hyperthreaded 
                     PowerPC
                 
                     512KB L2 cache
             
                 8 SPE cores @ 3.2GHz
                 
                     128bit SIMD optimized
                 
                     256KB SRAM



         
PS3 Clusters
          
              Cheap cluster for 
              academic researchers
          
              Carolina State U. and 
              U. Massachusetts at D.
          
              8+1 cluster with ssh and 
              MPI




       
PS3 GRID Computing

    PS3GRID project
    
        based on BOINC
    
        30,000 atoms simulation

    Folding@Home
    
        1 PFLOPS with 800 
        TFLOPS from PS3s
    
        BlueGene == 280 
        TFLOPS

                            
Linux on the PS3

    Lv1 Hypervisor shipped with the default firmware

    Partition utility in the Sony Game OS menu

    Choose your favorite distro: 




    Install a ­powerpc64­smp or ­ps3 kernel

    Install gcc­spu + libspe2


                        
Programming the Cell/BE in C

    Program the PPE as a chief conductor to spread the 
    numerical code to SPEs

    Use POSIX threads to start SPE subroutines in 
    parallel

    Use SPE intrinsics to perform vector instructions

    Eliminate branches as much as possible in SPE code

    Align your data to 16 bytes


                        
Introduction to SIMD programming

    128 bits registers (SSE2, Altivec, SPE)
     
         2 x double
     
         4 x float
     
         4 x int

    introduce new vector types

    1 vector float operation == 4 float operations

    logical (and, or, cmp, ...), arithmetic (+, *, abs, ...), 
    shuffling
                          
SIMD programming – the big picture 




              
Not always SIMD­izable




            
SIMD programming with libspe2 and 
                                gcc­spu

    #include <spu_intrinsics.h>

    avoid scalar types use:
    
        vector_float4
    
        vector_double2
    
        vector_char16 ...

    d = spu_and(a, b); e = spu_madd(a, b, c);

    spu­gcc  pure_spe_prog.c ­o pure_spe_prog.elf

                             
Branch elimination

    avoid branching (if / else)
    
        c = spu_sel(a, b, spu_cmpgt(a, d));




                            
A sample SPE program
volatile union {
       vec_float4 vec;
       float part[4];
} sum;
float dot_product(const float* xp, const float* yp, const int size) {
       sum.vec = (vec_float4) {0, 0, 0, 0};
       vec_float4* xvp = (vec_float4*) xp;
       vec_float4* yvp = (vec_float4*) yp; 
       vec_float4* xvp_end = xvp + size / 4;
       while(__builtin_expect(xvp < xvp_end, 1)) {
            sum.vec = spu_madd(*xvp, *yvp, sum.vec);
            xvp++;
            yvp++;
       }
       return sum.part[0] + sum.part[1] + sum.part[2] + sum.part[3];
}

                                       
DMA with the SPUs' Memory Flow 
                   Controllers

    #include <spu_mfcio.h>

    mfc_get(&local_data, main_mem_data_ea, 
    sizeof(local_data), DMA_TAG, 0, 0);

    mfc_put(&local_data, main_mem_data_ea, 
    sizeof(&local_data), DMA_TAG, 0, 0);

    mfc_getb(&local_data, main_mem_data_ea, 
    sizeof(local_data), DMA_TAG, 0, 0);

    spu_mfcstat(MFC_TAG_UPDATE_ALL);
                      
Double­buffering – the problem




            
Double­buffering – the big picture




             
Double­buffering with MFC

    1. SPU queues MFC GET to fill buffer #1

    2. SPU queues MFC GET to fill buffer #2

    3. SPU waits for buffer #1 to finish filling

    4. SPU processes buffer #1

    5. SPU queues MFC PUT back content of buffer #1

    6. SPU queues MFC GETB to refill buffer #1

    7. SPU waits for buffer #2 to finish filling

    8. SPU processes buffer #2 (...)

                        
Some resources

    Cell BE Programming Tutorial (ibm.com 190 pages)

    IBM developerworks short programming tutorials
    
         Search for articles by Jonathan Barlett

    Barcelona Supercomputing Center (software)
    
        http://www.bsc.es/projects/deepcomputing/linuxoncell/

    PS3 programming workshops (videos)
    
        http://www.cc.gatech.edu/~bader/CellProgramming.html

    #ps3dev on freenode
                            
Thanks, credits, licensing

    Most schemas from excellent GFDL 'd tutorial by 
    Geoff Levand (Sony Corp)
    
        http://www.kernel.org/pub/linux/kernel/people/geoff/cell

    Pictures and trade marks belong to their respective 
    owners (Sony, IBM, Universities, Folding@Home, 
    PS3GRID, ...)

    All remaining work is GFDL


                           
7 differences




       

Mais conteúdo relacionado

Semelhante a Programming the PS3

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter boardEmbedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Vincent Claes
 
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter boardSerial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Vincent Claes
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Semelhante a Programming the PS3 (20)

Tiny ML for spark Fun Edge
Tiny ML for spark Fun EdgeTiny ML for spark Fun Edge
Tiny ML for spark Fun Edge
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
TestUpload
TestUploadTestUpload
TestUpload
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruption
 
Beyond Puppet
Beyond PuppetBeyond Puppet
Beyond Puppet
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011
 
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter boardEmbedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
 
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter boardSerial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

Mais de Olivier Grisel (7)

Strategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in PythonStrategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in Python
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13
 
Nuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&DNuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&D
 
Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Programming the PS3

  • 1. Programming with Linux on the  Playstation3     FOSDEM 2008 olivier.grisel@ensta.org  Architecture overview:   introducing the Cell BE   Installing Linux  SIMD programming in C/C++  Asynchronous data transfer with  the DMA    
  • 2. Who am I  Java / Python developer at Nuxeo (FOSS document  management server)  Interested in Artificial Intelligence (and need fast  Support Vector Machines)  Slides to be published at: http://oliviergrisel.name    
  • 3. PS3 architecture overview  CPU: IBM Cell/BE @ 3.2GHz   218 GFLOPS  Main RAM: 256MB XDR (64b@3.2GHz)  GPU: Nvidia RSX   1.8 TFLOPS (SP) / 356 GFLOPS programmable   VRAM: 256MB GDDR3 (2x128b@700MHz)  System Bus: 2.5 GB/s    
  • 4. The Cell Broadband Engine  1 PPE core @ 3.2GHz  64bit hyperthreaded  PowerPC  512KB L2 cache  8 SPE cores @ 3.2GHz  128bit SIMD optimized  256KB SRAM    
  • 5. PS3 Clusters  Cheap cluster for  academic researchers  Carolina State U. and  U. Massachusetts at D.  8+1 cluster with ssh and  MPI    
  • 6. PS3 GRID Computing  PS3GRID project  based on BOINC  30,000 atoms simulation  Folding@Home  1 PFLOPS with 800  TFLOPS from PS3s  BlueGene == 280  TFLOPS    
  • 7. Linux on the PS3  Lv1 Hypervisor shipped with the default firmware  Partition utility in the Sony Game OS menu  Choose your favorite distro:   Install a ­powerpc64­smp or ­ps3 kernel  Install gcc­spu + libspe2    
  • 8. Programming the Cell/BE in C  Program the PPE as a chief conductor to spread the  numerical code to SPEs  Use POSIX threads to start SPE subroutines in  parallel  Use SPE intrinsics to perform vector instructions  Eliminate branches as much as possible in SPE code  Align your data to 16 bytes    
  • 9. Introduction to SIMD programming  128 bits registers (SSE2, Altivec, SPE)  2 x double  4 x float  4 x int  introduce new vector types  1 vector float operation == 4 float operations  logical (and, or, cmp, ...), arithmetic (+, *, abs, ...),  shuffling    
  • 12. SIMD programming with libspe2 and  gcc­spu  #include <spu_intrinsics.h>  avoid scalar types use:  vector_float4  vector_double2  vector_char16 ...  d = spu_and(a, b); e = spu_madd(a, b, c);  spu­gcc  pure_spe_prog.c ­o pure_spe_prog.elf    
  • 13. Branch elimination  avoid branching (if / else)  c = spu_sel(a, b, spu_cmpgt(a, d));    
  • 14. A sample SPE program volatile union { vec_float4 vec; float part[4]; } sum; float dot_product(const float* xp, const float* yp, const int size) { sum.vec = (vec_float4) {0, 0, 0, 0};        vec_float4* xvp = (vec_float4*) xp;        vec_float4* yvp = (vec_float4*) yp;  vec_float4* xvp_end = xvp + size / 4; while(__builtin_expect(xvp < xvp_end, 1)) { sum.vec = spu_madd(*xvp, *yvp, sum.vec); xvp++; yvp++; } return sum.part[0] + sum.part[1] + sum.part[2] + sum.part[3]; }    
  • 15. DMA with the SPUs' Memory Flow  Controllers  #include <spu_mfcio.h>  mfc_get(&local_data, main_mem_data_ea,  sizeof(local_data), DMA_TAG, 0, 0);  mfc_put(&local_data, main_mem_data_ea,  sizeof(&local_data), DMA_TAG, 0, 0);  mfc_getb(&local_data, main_mem_data_ea,  sizeof(local_data), DMA_TAG, 0, 0);  spu_mfcstat(MFC_TAG_UPDATE_ALL);    
  • 18. Double­buffering with MFC  1. SPU queues MFC GET to fill buffer #1  2. SPU queues MFC GET to fill buffer #2  3. SPU waits for buffer #1 to finish filling  4. SPU processes buffer #1  5. SPU queues MFC PUT back content of buffer #1  6. SPU queues MFC GETB to refill buffer #1  7. SPU waits for buffer #2 to finish filling  8. SPU processes buffer #2 (...)    
  • 19. Some resources  Cell BE Programming Tutorial (ibm.com 190 pages)  IBM developerworks short programming tutorials   Search for articles by Jonathan Barlett  Barcelona Supercomputing Center (software)  http://www.bsc.es/projects/deepcomputing/linuxoncell/  PS3 programming workshops (videos)  http://www.cc.gatech.edu/~bader/CellProgramming.html  #ps3dev on freenode    
  • 20. Thanks, credits, licensing  Most schemas from excellent GFDL 'd tutorial by  Geoff Levand (Sony Corp)  http://www.kernel.org/pub/linux/kernel/people/geoff/cell  Pictures and trade marks belong to their respective  owners (Sony, IBM, Universities, Folding@Home,  PS3GRID, ...)  All remaining work is GFDL