SlideShare a Scribd company logo
1 of 92
Download to read offline
PyCUDA:
Harnessing the power of GPU with Python
Talk Structure




                    1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                    4. Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
Talk Structure




                    1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                    4. Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
WHY A GPU ?


PyCon 4 – Florence 2010 – Fabrizio Milo
APPLICATIONS & DEMOS


PyCon 4 – Florence 2010 – Fabrizio Milo
Why GPU?




PyCon 4 – Florence 2010 – Fabrizio Milo
Talk Structure




                    1. Why a GPU ?
                    2. How does it works ?
                    3. How do I Program it ?
                    4. Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
How does it works ?




PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU

                                          Control

                                                            ALU   ALU




                                                    Cache




                                DRAM




                                                    CPU
PyCon 4 – Florence 2010 – Fabrizio Milo
DRAM




                                          GPU
PyCon 4 – Florence 2010 – Fabrizio Milo
ALU   ALU
                   Control
                                              ALU   ALU



                                      Cache




           DRAM                                           DRAM



                                      CPU                        GPU




PyCon 4 – Florence 2010 – Fabrizio Milo
CUDA




PyCon 4 – Florence 2010 – Fabrizio Milo
Compute Unified Device Architecture




PyCon 4 – Florence 2010 – Fabrizio Milo
CUDA
                      A Parallel Computing Architecture for NVIDIA GPUs




                                                Direct X
                                               Compute




PyCon 4 – Florence 2010 – Fabrizio Milo
Execution Model

                        CUDA
                                          Device Model




PyCon 4 – Florence 2010 – Fabrizio Milo
EXECUTION MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Thread
                            Smallest unit of logic




PyCon 4 – Florence 2010 – Fabrizio Milo
A Block
                            A Group of Threads




PyCon 4 – Florence 2010 – Fabrizio Milo
A Grid
                            A Group of Blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
One Block can have many threads




PyCon 4 – Florence 2010 – Fabrizio Milo
One Grid can have many blocks




PyCon 4 – Florence 2010 – Fabrizio Milo
The hardware

     DEVICE MODEL


PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Scalar Processor




PyCon 4 – Florence 2010 – Fabrizio Milo
Many Scalar Processors




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Register File




PyCon 4 – Florence 2010 – Fabrizio Milo
+ Shared Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Multiprocessor




PyCon 4 – Florence 2010 – Fabrizio Milo
Device




PyCon 4 – Florence 2010 – Fabrizio Milo
Real Example: 10-Series Architecture

"   240 Scalar Processor (SP) cores execute kernel threads
"   30 Streaming Multiprocessors (SMs) each contain
         " 8 scalar processors
             
         "  1 double precision unit
         "  Shared memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Software   Hardware

                                                         Scalar
                                                       Processor
                                           Thread




                                           Thread
                                            Block    Multiprocessor




                                            Grid        Device
PyCon 4 – Florence 2010 – Fabrizio Milo
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU




                            Host – Multi Device




PyCon 4 – Florence 2010 – Fabrizio Milo
1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                    4. Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
Software   Hardware

                                                         Scalar
                                                       Processor
                                           Thread




                                           Thread
                                            Block    Multiprocessor




                                            Grid        Device
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	    	     	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Thread
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	     	 float *a, 	
     	   	     	    	    	     	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Thread
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void multiply_them( float *dest,
     	   	     	    	    	   	 float *a, 	
     	   	     	    	    	   	 float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}	




                                          Block
PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


__global__ void kernel( … )	
{	
   const int idx =	

                blockIdx.x * blockDim.x + threadIdx.x;	
        …	
}	




                                          Grid
PyCon 4 – Florence 2010 – Fabrizio Milo
How do I Program it ?


                                          Main Logic   Kernel


                                            GCC
                                                       NVCC




         CPU                                 .bin      .cubin   GPU




PyCon 4 – Florence 2010 – Fabrizio Milo
How do I Program it ?


                                          Main Logic                Kernel


                                            GCC
                                                                    NVCC



                                                                             GPU

                                             .bin                   .cubin




                                                    .bin   .cubin     .      CPU

PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU    Global Memory




                            Host - Device




PyCon 4 – Florence 2010 – Fabrizio Milo
RAM




                                     CPU   Global Memory




PyCon 4 – Florence 2010 – Fabrizio Milo
Allocate Memory


cudaMalloc( pointer, size )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Copy to device


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel Launch


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threads >> (*params)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Get Back the Results


cudaMalloc( pointer, size )	

cudaMemcpy( dest, src, size, direction)	

Kernel<<< # blocks, # threads >> (*params)	

cudaMemcpy( dest, src, size, direction)	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Error Handling




If(cudaMalloc( pointer, size ) != cudaSuccess){	
   handle_error()	
}	




 PyCon 4 – Florence 2010 – Fabrizio Milo
And soon it becomes …


If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	
}	

if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	

If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 handle_error()	
}	

If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	




  PyCon 4 – Florence 2010 – Fabrizio Milo
And soon it becomes …
If(cudaMalloc( pointer, size ) != cudaSuccess){	
 handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
}	                                                                    handle_error()	
                                                                     }	
if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                     if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
}	                                                                    handle_error()	
                                                                     }	
If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                     If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	

 If(cudaMalloc( pointer, size ) != cudaSuccess){	
  handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
 }	                                                                    handle_error()	
                                                                      }	
 if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                      if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
 If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
  handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
 }	                                                                    handle_error()	
                                                                      }	
 If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                      If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	


  If(cudaMalloc( pointer, size ) != cudaSuccess){	
   handle_error()	                                                     If(cudaMalloc( pointer, size ) != cudaSuccess){	
  }	                                                                    handle_error()	
                                                                       }	
  if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
                                                                       if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}	
  If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
   handle_error()	                                                     If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){	
  }	                                                                    handle_error()	
                                                                       }	
  If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	
                                                                       If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }	




  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
1. Why a GPU ?
                    2. How does It works ?
                    3. How do I Program it ?
                    4. Can I Use Python ?

PyCon 4 – Florence 2010 – Fabrizio Milo
+




    & ANDREAS KLOCKNER

    = PYCUDA

PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                             Provide
                                            Complete
                                             Access

  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                            AutoMatically
                                              Manage
                                             Resources

  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                             Check and
                                            Report Errors



  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                           Cross
                                          Platform



PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                               Allow
                                            Interactive
                                                Use


  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda Philosopy




                                              NumPy
                                            Integration



  PyCon 4 – Florence 2010 – Fabrizio Milo
NUMPY - ARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
1       1   1   1   1   1

                                               0                   99




import numpy	

 my_array = numpy.array([1,] * 100)	



 PyCon 4 – Florence 2010 – Fabrizio Milo
1   1   1   0   1   1




import numpy	

 my_array = numpy.array([1,] * 100)	

 my_array[3] = 0	
 PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Workflow




PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Allocation


cuda.mem_alloc( size_bytes )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Memory Copy


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	




 PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel


gpu_mem = cuda.mem_alloc( size_bytes )	

cuda.memcpy_htod( gpu_mem, cpu_mem )	

SourceModule(“””	
__global__ void multiply_them( float *dest, float *a, 	
       	    	      	      	    	      	      float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}”””)	




  PyCon 4 – Florence 2010 – Fabrizio Milo
Kernel Launch


mod = SourceModule(“””	
__global__ void multiply_them( float *dest, float *a, 	
       	    	      	      	    	      	      float *b )	
{	
   const int i = threadIdx.x;	
   dest[i] = a[i] * b[i];	
}”””)	

multiply_them = mod.get_function(“multiply_them”)	
multiply_them ( *args, block=(30, 64, 1))	




  PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
PyCon 4 – Florence 2010 – Fabrizio Milo
Hello Gpu

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
GPUARRAY
PyCon 4 – Florence 2010 – Fabrizio Milo
gpuarray




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray




   gpuarray.to_gpu(numpy array)	

   numpy array = gpuarray.get()	

     +, -, !, /, fill, sin, exp, rand, basic
     indexing, norm, inner product …

PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lincomb = ElementwiseKernel(
      ” float a , float !x , float b , float !y , float !z”,
      ”z [ i ] = a !x[ i ] + b!y[i ] ”
)




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: GpuArray: ElementWise



from pycuda.elementwise import ElementwiseKernel


lin comb = ElementwiseKernel(
       ” float a , float !x , float b , float !y , float !z”,
       ”z [ i ] = a !x[ i ] + b!y[i ] ”
)

c gpu = gpuarray. empty like (a gpu)
lincomb (5, a gpu, 6, b gpu, c gpu)

assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5
PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming


__kernel_template__ = “””	
__global__ void kernel( args )	
{	

for (int i=0; i={{ iterations }}; i++){	
 {{operations}}	
}	

}”””	




  See for example jinja2

  PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming




PyCon 4 – Florence 2010 – Fabrizio Milo
Meta-Programming




         Generate Source !




PyCon 4 – Florence 2010 – Fabrizio Milo
Performances ?




PyCon 4 – Florence 2010 – Fabrizio Milo
mandelbrot

     DEMO


PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda: Documentation




PyCon 4 – Florence 2010 – Fabrizio Milo
PyCuda

WebSite:
http://mathema.tician.de/software/ pycuda

License:
X Consortium License
  (no warranty, free for all use)

Dependencies:
  Python 2.4+, numpy, Boost
 PyCon 4 – Florence 2010 – Fabrizio Milo
In the Future …




    OPENCL

PyCon 4 – Florence 2010 – Fabrizio Milo
THANK YOU & HAVE FUN !


PyCon 4 – Florence 2010 – Fabrizio Milo
?

PyCon 4 – Florence 2010 – Fabrizio Milo

More Related Content

More from PyCon Italia

Spyppolare o non spyppolare
Spyppolare o non spyppolareSpyppolare o non spyppolare
Spyppolare o non spyppolarePyCon Italia
 
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"PyCon Italia
 
Undici anni di lavoro con Python
Undici anni di lavoro con PythonUndici anni di lavoro con Python
Undici anni di lavoro con PythonPyCon Italia
 
socket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Pythonsocket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in PythonPyCon Italia
 
Qt mobile PySide bindings
Qt mobile PySide bindingsQt mobile PySide bindings
Qt mobile PySide bindingsPyCon Italia
 
Python: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPython: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPyCon Italia
 
Python in the browser
Python in the browserPython in the browser
Python in the browserPyCon Italia
 
PyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyCon Italia
 
OpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonOpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonPyCon Italia
 
New and improved: Coming changes to the unittest module
 	 New and improved: Coming changes to the unittest module 	 New and improved: Coming changes to the unittest module
New and improved: Coming changes to the unittest modulePyCon Italia
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
 
Jython for embedded software validation
Jython for embedded software validationJython for embedded software validation
Jython for embedded software validationPyCon Italia
 
Foxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoFoxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoPyCon Italia
 
Django è pronto per l'Enterprise
Django è pronto per l'EnterpriseDjango è pronto per l'Enterprise
Django è pronto per l'EnterprisePyCon Italia
 
Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.PyCon Italia
 
Comet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedComet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedPyCon Italia
 
Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1PyCon Italia
 

More from PyCon Italia (19)

Spyppolare o non spyppolare
Spyppolare o non spyppolareSpyppolare o non spyppolare
Spyppolare o non spyppolare
 
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
zc.buildout: "Un modo estremamente civile per sviluppare un'applicazione"
 
Undici anni di lavoro con Python
Undici anni di lavoro con PythonUndici anni di lavoro con Python
Undici anni di lavoro con Python
 
socket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Pythonsocket e SocketServer: il framework per i server Internet in Python
socket e SocketServer: il framework per i server Internet in Python
 
Qt mobile PySide bindings
Qt mobile PySide bindingsQt mobile PySide bindings
Qt mobile PySide bindings
 
Python: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi geneticiPython: ottimizzazione numerica algoritmi genetici
Python: ottimizzazione numerica algoritmi genetici
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Python in the browser
Python in the browserPython in the browser
Python in the browser
 
PyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fastPyPy 1.2: snakes never crawled so fast
PyPy 1.2: snakes never crawled so fast
 
OpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con PythonOpenERP e l'arte della gestione aziendale con Python
OpenERP e l'arte della gestione aziendale con Python
 
New and improved: Coming changes to the unittest module
 	 New and improved: Coming changes to the unittest module 	 New and improved: Coming changes to the unittest module
New and improved: Coming changes to the unittest module
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntop
 
Jython for embedded software validation
Jython for embedded software validationJython for embedded software validation
Jython for embedded software validation
 
Foxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automaticoFoxgame introduzione all'apprendimento automatico
Foxgame introduzione all'apprendimento automatico
 
Effective EC2
Effective EC2Effective EC2
Effective EC2
 
Django è pronto per l'Enterprise
Django è pronto per l'EnterpriseDjango è pronto per l'Enterprise
Django è pronto per l'Enterprise
 
Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.Crogioli, alambicchi e beute: dove mettere i vostri dati.
Crogioli, alambicchi e beute: dove mettere i vostri dati.
 
Comet web applications with Python, Django & Orbited
Comet web applications with Python, Django & OrbitedComet web applications with Python, Django & Orbited
Comet web applications with Python, Django & Orbited
 
Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1Cleanup and new optimizations in WPython 1.1
Cleanup and new optimizations in WPython 1.1
 

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

  • 1. PyCUDA: Harnessing the power of GPU with Python
  • 2. Talk Structure 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 3. Talk Structure 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 4. WHY A GPU ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 5. APPLICATIONS & DEMOS PyCon 4 – Florence 2010 – Fabrizio Milo
  • 6. Why GPU? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 7. Talk Structure 1. Why a GPU ? 2. How does it works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 8. How does it works ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 9. ALU ALU Control ALU ALU Cache DRAM CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 10. DRAM GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 11. ALU ALU Control ALU ALU Cache DRAM DRAM CPU GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 12. CUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  • 13. Compute Unified Device Architecture PyCon 4 – Florence 2010 – Fabrizio Milo
  • 14. CUDA A Parallel Computing Architecture for NVIDIA GPUs Direct X Compute PyCon 4 – Florence 2010 – Fabrizio Milo
  • 15. Execution Model CUDA Device Model PyCon 4 – Florence 2010 – Fabrizio Milo
  • 16. EXECUTION MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 17. Thread Smallest unit of logic PyCon 4 – Florence 2010 – Fabrizio Milo
  • 18. A Block A Group of Threads PyCon 4 – Florence 2010 – Fabrizio Milo
  • 19. A Grid A Group of Blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  • 20. One Block can have many threads PyCon 4 – Florence 2010 – Fabrizio Milo
  • 21. One Grid can have many blocks PyCon 4 – Florence 2010 – Fabrizio Milo
  • 22. The hardware DEVICE MODEL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 23. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 24. Scalar Processor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 25. Many Scalar Processors PyCon 4 – Florence 2010 – Fabrizio Milo
  • 26. + Register File PyCon 4 – Florence 2010 – Fabrizio Milo
  • 27. + Shared Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 28. Multiprocessor PyCon 4 – Florence 2010 – Fabrizio Milo
  • 29. Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 30. Real Example: 10-Series Architecture "   240 Scalar Processor (SP) cores execute kernel threads "   30 Streaming Multiprocessors (SMs) each contain " 8 scalar processors   "  1 double precision unit "  Shared memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 31. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 32. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 33. Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 34. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 35. RAM CPU Host – Multi Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 36. 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 37. Software Hardware Scalar Processor Thread Thread Block Multiprocessor Grid Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 38. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  • 39. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Thread PyCon 4 – Florence 2010 – Fabrizio Milo
  • 40. Kernel __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } Block PyCon 4 – Florence 2010 – Fabrizio Milo
  • 41. Kernel __global__ void kernel( … ) { const int idx = blockIdx.x * blockDim.x + threadIdx.x; … } Grid PyCon 4 – Florence 2010 – Fabrizio Milo
  • 42. How do I Program it ? Main Logic Kernel GCC NVCC CPU .bin .cubin GPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 43. How do I Program it ? Main Logic Kernel GCC NVCC GPU .bin .cubin .bin .cubin . CPU PyCon 4 – Florence 2010 – Fabrizio Milo
  • 44. RAM CPU Global Memory Host - Device PyCon 4 – Florence 2010 – Fabrizio Milo
  • 45. RAM CPU Global Memory PyCon 4 – Florence 2010 – Fabrizio Milo
  • 46. Allocate Memory cudaMalloc( pointer, size ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 47. Copy to device cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 48. Kernel Launch cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 49. Get Back the Results cudaMalloc( pointer, size ) cudaMemcpy( dest, src, size, direction) Kernel<<< # blocks, # threads >> (*params) cudaMemcpy( dest, src, size, direction) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 50. Error Handling If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 51. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 52. And soon it becomes … If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error() If(cudaMalloc( pointer, size ) != cudaSuccess){ } handle_error() } if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {} If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error() If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ } handle_error() } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { } PyCon 4 – Florence 2010 – Fabrizio Milo
  • 53. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 54. 1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 55. + & ANDREAS KLOCKNER = PYCUDA PyCon 4 – Florence 2010 – Fabrizio Milo
  • 56. PyCuda Philosopy Provide Complete Access PyCon 4 – Florence 2010 – Fabrizio Milo
  • 57. PyCuda Philosopy AutoMatically Manage Resources PyCon 4 – Florence 2010 – Fabrizio Milo
  • 58. PyCuda Philosopy Check and Report Errors PyCon 4 – Florence 2010 – Fabrizio Milo
  • 59. PyCuda Philosopy Cross Platform PyCon 4 – Florence 2010 – Fabrizio Milo
  • 60. PyCuda Philosopy Allow Interactive Use PyCon 4 – Florence 2010 – Fabrizio Milo
  • 61. PyCuda Philosopy NumPy Integration PyCon 4 – Florence 2010 – Fabrizio Milo
  • 62. NUMPY - ARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  • 63. 1 1 1 1 1 1 0 99 import numpy my_array = numpy.array([1,] * 100) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 64. 1 1 1 0 1 1 import numpy my_array = numpy.array([1,] * 100) my_array[3] = 0 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 65. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 66. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 67. PyCuda: Workflow PyCon 4 – Florence 2010 – Fabrizio Milo
  • 68. Memory Allocation cuda.mem_alloc( size_bytes ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 69. Memory Copy gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 70. Kernel gpu_mem = cuda.mem_alloc( size_bytes ) cuda.memcpy_htod( gpu_mem, cpu_mem ) SourceModule(“”” __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 71. Kernel Launch mod = SourceModule(“”” __global__ void multiply_them( float *dest, float *a, float *b ) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; }”””) multiply_them = mod.get_function(“multiply_them”) multiply_them ( *args, block=(30, 64, 1)) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 72. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 73. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 74. PyCon 4 – Florence 2010 – Fabrizio Milo
  • 75. Hello Gpu DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  • 76. GPUARRAY PyCon 4 – Florence 2010 – Fabrizio Milo
  • 77. gpuarray PyCon 4 – Florence 2010 – Fabrizio Milo
  • 78. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() PyCon 4 – Florence 2010 – Fabrizio Milo
  • 79. PyCuda: GpuArray gpuarray.to_gpu(numpy array) numpy array = gpuarray.get() +, -, !, /, fill, sin, exp, rand, basic indexing, norm, inner product … PyCon 4 – Florence 2010 – Fabrizio Milo
  • 80. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel PyCon 4 – Florence 2010 – Fabrizio Milo
  • 81. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lincomb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) PyCon 4 – Florence 2010 – Fabrizio Milo
  • 82. PyCuda: GpuArray: ElementWise from pycuda.elementwise import ElementwiseKernel lin comb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ” ) c gpu = gpuarray. empty like (a gpu) lincomb (5, a gpu, 6, b gpu, c gpu) assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 83. Meta-Programming __kernel_template__ = “”” __global__ void kernel( args ) { for (int i=0; i={{ iterations }}; i++){ {{operations}} } }””” See for example jinja2 PyCon 4 – Florence 2010 – Fabrizio Milo
  • 84. Meta-Programming PyCon 4 – Florence 2010 – Fabrizio Milo
  • 85. Meta-Programming Generate Source ! PyCon 4 – Florence 2010 – Fabrizio Milo
  • 86. Performances ? PyCon 4 – Florence 2010 – Fabrizio Milo
  • 87. mandelbrot DEMO PyCon 4 – Florence 2010 – Fabrizio Milo
  • 88. PyCuda: Documentation PyCon 4 – Florence 2010 – Fabrizio Milo
  • 89. PyCuda WebSite: http://mathema.tician.de/software/ pycuda License: X Consortium License (no warranty, free for all use) Dependencies: Python 2.4+, numpy, Boost PyCon 4 – Florence 2010 – Fabrizio Milo
  • 90. In the Future … OPENCL PyCon 4 – Florence 2010 – Fabrizio Milo
  • 91. THANK YOU & HAVE FUN ! PyCon 4 – Florence 2010 – Fabrizio Milo
  • 92. ? PyCon 4 – Florence 2010 – Fabrizio Milo