SlideShare a Scribd company logo
1 of 42
Download to read offline
Getting started with
   Concurrency
  ...using Multiprocessing and Threading

         PyWorks, Atlanta 2008
            Jesse Noller
Who am I?
•   Just another guy.
•   Wrote PEP 371- “Addition of the multiprocessing package”
•   Did a lot of the integration, now the primary point of
    contact.
    •   This means I talk a lot on the internet.
•   Test Engineering is my focus - many of those are distributed
    and/or concurrent.
    •   This stuff is not my full time job.
What is concurrency?


•   Simultaneous execution

•   Potentially interacting tasks

•   Uses multi-core hardware

•   Includes Parallelism.
What is a thread?

•   Share the memory and state of the parent.
•   Are “light weight”
•   Each gets its own stack
•   Do not use Inter-Process Communication or messaging.

•   POSIX “Threads” - pthreads.
What are they good for?

•   Adding throughput and reduce latency within most
    applications.
    •   Throughput: adding threads allows you to process
        more information faster.
    •   Latency: adding threads to make the application react
        faster, such as GUI actions.

•   Algorithms which rely on shared data/state
What is a process?

•   An independent process-of-control.
•   Processes are “share nothing”.
•   Must use some form of Inter-Process
    Communication to communicate/coordinate.

•   Processes are “big”.
Uses for Processes.

•   When you don’t need to share lots of state and want a large
    amount of throughput.
•   Shared-Nothing-or-Little is “safer” then shared-everything.
•   Processes automatically run on multiple cores.
•   Easier to turn into a distributed application.
The Difference

•   Threads are implicitly “share everything” - this makes the
    programmer have to protect (lock) anything which will be
    shared between threads.
•   Processes are “share nothing” - programmers must explicitly
    share any data/state - this means that the programmer is
    forced to think about what is being shared
•   Explicit is better than Implicit
Python Threads


•   Python has threads, they are real, OS/Kernel level POSIX (p)
    threads.
    •   When you use threading.Thread, you get a pthread.
2.6 changes

•   camelCase method names are now foo_bar() style, e.g.:
    active_count, current_thread, is_alive, etc.
•   Attributes of threads and processes have been turned into
    properties.
    •   E.g.: daemon is now Thread.daemon = <bool>
•   For threading: these changes are optional. The old methods
    still exist.
...


Python does not use “green threads”. It has real threads. OS
Ones. Stop saying it doesn’t.
...


But: Python only allows a single thread to be executing within
the interpreter at once. This restriction is enforced by the
GIL.
The GIL


•   GIL: “Global Interpreter Lock” - this is a lock which must be
    acquired for a thread to enter the interpreter’s space.
•   Only one thread may be executing within the Python
    interpreter at once.
Yeah but...
•   No, it is not a bug.
•   It is an implementation detail of CPython interpreter
•   Interpreter maintenance is easier.
•   Creation of new C extension modules easier.
•   (mostly) sidestepped if the app is I/O (file, socket) bound.
•   A threaded app which makes heavy use of sockets, won’t see
    a a huge GIL penalty: it is still there though.
•   Not going away right now.
The other guys

•   Jython: No GIL, allows “free” threading by using the
    underlying Java threading system.
•   IronPython: No GIL - uses the underlying CLR, threads all
    run concurrently.
•   Stackless: Has a GIL. But has micro threads.
•   PyPy: Has a GIL... for now (dun dun dun)
Enter Multiprocessing
What is multiprocessing?

•   Follows the threading API closely but uses Processes and
    inter-process communication under the hood
•   Also offers distributed-computing faculties as well.
•   Allows the side-stepping of the GIL for CPU bound
    applications.
•   Allows for data/memory sharing.
•   CPython only.
Why include it?

•   Covered in PEP 371, we wanted to have something which was
    fast and “freed” many users from the GIL restrictions.
•   Wanted to add to the concurrency toolbox for Python as a
    whole.
     •   It is not the “final” answer. Nor is it “feature complete”
•   Oh, and it beats the threading module in speed.*
                          * lies, damned lies and benchmarks
How much faster?


•   It depends on the problem.
•   For example: number crunching, it’s significantly faster then
    adding threads.
•   Also faster in wide-finder/map-reduce situations
•   Process creation can be sluggish: create the workers up front.
Example: Crunching Primes


•       Yes, I picked something embarrassingly parallel.
•       Sum all of the primes in a range of integers starting from
        1,000,000 and going to 5,000,000.
•       Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6,
        completely idle, except for iTunes.
    •     The single threaded version took so long I needed music.
# Single threaded version
import math

def isprime(n):
    quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot;
    if not isinstance(n, int):
        raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;)
    if n < 2:
        return False
    if n == 2:
        return True
    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True

def sum_primes(n):
    quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot;
    return sum([x for x in xrange(2, n) if isprime(x)])

if __name__ == quot;__main__quot;:
    for i in xrange(100000, 5000000, 100000):
        print sum_primes(i)
# Multi Threaded version
from threading import Thread
from Queue import Queue, Empty
...
def do_work(q):
    while True:
        try:
             x = q.get(block=False)
             print sum_primes(x)
        except Empty:
             break

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

    threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
# Multiprocessing version
from multiprocessing import Process, Queue
from Queue import Empty
...

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

   processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)]
   for p in processes:
       p.start()
   for p in processes:
       p.join()
Results

•   All results are in wall-clock time.
•   Single Threaded: 41 minutes, 57 seconds
•   Multi Threaded (8 threads):    106 minutes, 29 seconds
•   MultiProcessing (8 Processes): 6 minutes, 22 seconds
•   This is a trivial example. More benchmarks/data were
    included in the PEP.
The catch.
•   Objects that are shared between processes must be
    serialize-able (pickle).
    •   40921 object/sec versus 24989 objects a second.
•   Processes are “heavy-weight”.
•   Processes can be slow to start. (on windows)
•   Supported on Linux, Solaris, Windows, OS/X - but not *BSD,
    and possibly others.
•   If you are creating and destroying lots of threads - processes
    are a significant impact.
API Time
It starts with a Process

•   Exactly like threading:
    •    Thread(target=func, args=(args,)).start()
    •    Process(target=func, args=(args,)).start()
•   You can subclass multiprocessing.Process exactly as you
    would with threading.Thread.
from threading import Thread
threads = [Thread(target=do_work, args=(q,)) for i in range(8)]

from multiprocessing import Process
processes = [Process(target=do_work, args=(q,)) for i in range(8)]
# Multiprocessing version             # Threading version
from multiprocessing import Process   from threading import Thread

class MyProcess(Process):             class MyThread(Thread):
    def __init__(self):                   def __init__(self):
        Process.__init__(self)                Thread.__init__(self)
    def run(self):                        def run(self):
        a, b = 0, 1                           a, b = 0, 1
        for i in range(100000):               for i in range(100000):
            a, b = b, a + b                       a, b = b, a + b

if __name__ == quot;__main__quot;:            if __name__ == quot;__main__quot;:
    p = MyProcess()                       t = MyThread()
    p.start()                             t.start()
    print p.pid                           t.join()
    p.join()
    print p.exitcode
Queues


•   multiprocessing includes 2 Queue implementations - Queue
    and JoinableQueue.
•   Queue is modeled after Queue.Queue but uses pipes
    underneath to transmit the data.
•   JoinableQueue is the same as Queue except it adds a .join()
    method and .task_done() ala Queue.Queue in python 2.5.
Queue.warning

•   The first is that if you call .terminate or kill a process which is
    currently accessing a queue: that queue may become corrupted.
•   The second is that any Queue that a Process has put data on
    must be drained prior to joining the processes which have put
    data there: otherwise, you’ll get a deadlock.
     •   Avoid this by calling Queue.cancel_join_thread() in
         the child process.
     •   Or just eat everything on the results pipe before calling
         join (e.g. work_queue, results_queue).
Pipes and Locks

•   Multiprocessing supports communication primitives.
    •   multiprocessing.Pipe(), which returns a pair of Connection
        objects which represent the ends of the pipe.
    •   The data sent on the connection must be pickle-able.
•   Multiprocessing has clones of all of the threading modules
    lock/RLock, Event, Condition and semaphore objects.
    •   Most of these support timeout arguments, too!
Shared Memory
•   Multiprocessing has a sharedctypes
    module.
•   This module allows you to create a   from multiprocessing import Process
    ctypes object in shared memory       from multiprocessing.sharedctypes import Value
    and share it with other processes.   from ctypes import c_int

                                         def modify(x):
•   The sharedctypes module offers           x.value += 1
    some safety through the use/
                                         x = Value(ctypes.c_int, 7)
    allocation of locks which prevent    p = Process(target=modify, args=(x))
    simultaneous accessing/              p.start()
    modification of the shared objects.   p.join()

                                         print x.value
Pools


•   One of the big “ugh” moments using threading is when you
    have a simple problem you simply want to pass to a pool of
    workers to hammer out.
•   Fact: There’s more thread pool implementations out there
    then stray cats in my neighborhood.
Process Pools!
•   Multiprocessing has the Pool object. This supports the up-front
    creation of a number of processes and a number of methods of
    passing work to the workers.
•   Pool.apply() - this is a clone of builtin apply() function.
     •   Pool.apply_async() - which can call a callback for you
         when the result is available.
•   Pool.map() - again, a parallel clone of the built in function.
     •   Pool.map_async() method, which can also get a callback to
         ring up when the results are done.
•   Fact: Functional programming people love this!
Pools raise insurance rates!

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      pool = Pool(processes=2)
      result = pool.apply_async(f, (10,))
      print result.get()


The output is: 100, note that the result returned is a AsyncResult
                                type.
Managers

•   Managers are a network and process-based way of sharing data
    between processes (and machines).
•   The primary manager type is the BaseManager - this is the
    basic Manager object, and can easily be subclassed to share
    data remotely
•   A proxy object is the type returned when accessing a
    shared object - this is a reference to the actual object being
    exported by the manager.
Sharing a queue (server)
# Manager Server
from Queue import Empty
from multiprocessing import managers, Queue

_queue = Queue()
def get_queue():
    return _queue

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue', callable=get_queue)

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
_queue.put('What’s up remote process')

s = m.get_server()
s.serve_forever()
Sharing a queue (client)

# Manager Client
from multiprocessing import managers

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue')

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
m.connect()
remote_queue = m.get_queue()
print remote_queue.get()
Gotchas

•   Processes which feed into a multiprocessing.Queue will block
    waiting for all the objects it put there to be removed.
•   Data must be pickle-able: this means some objects (for
    instance, GUI ones) can not be shared.
•   Arguments to Proxies (managers) methods must be pickle-able
    as well.
•   While it supports locking/semaphores: using those means
    you’re sharing something you may not need to be sharing.
In Closing

•   Multiple processes are not mutually exclusive with using
    Threads
•   Multiprocessing offers a simple and known API
     •   This lowers the barrier of entry significantly
     •   Side steps the GIL
•   In addition to “just processes” multiprocessing offers the start
    of grid-computing utilities
Questions?

More Related Content

What's hot

What's hot (20)

Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Python | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python TutorialPython | What is Python | History of Python | Python Tutorial
Python | What is Python | History of Python | Python Tutorial
 
Python OOPs
Python OOPsPython OOPs
Python OOPs
 
Python basics
Python basicsPython basics
Python basics
 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu Sharma
 
Python presentation
Python presentationPython presentation
Python presentation
 
Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02
 
Presentation on python
Presentation on pythonPresentation on python
Presentation on python
 
Advantages of Python Learning | Why Python
Advantages of Python Learning | Why PythonAdvantages of Python Learning | Why Python
Advantages of Python Learning | Why Python
 
Ekon25 mORMot 2 Server-Side Notifications
Ekon25 mORMot 2 Server-Side NotificationsEkon25 mORMot 2 Server-Side Notifications
Ekon25 mORMot 2 Server-Side Notifications
 
Object oriented programming in python
Object oriented programming in pythonObject oriented programming in python
Object oriented programming in python
 
Introduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | EdurekaIntroduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | Edureka
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Python Control structures
Python Control structuresPython Control structures
Python Control structures
 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
 
Python: Multiple Inheritance
Python: Multiple InheritancePython: Multiple Inheritance
Python: Multiple Inheritance
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
 

Viewers also liked

Radiology nurse kpi
Radiology nurse kpiRadiology nurse kpi
Radiology nurse kpi
asfoskenfur
 
Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...
Philippe GORRY
 
LSTE 7310_DUA #3
LSTE 7310_DUA #3LSTE 7310_DUA #3
LSTE 7310_DUA #3
lisaarhoden
 
Ppt on expanded role of nurse
Ppt on expanded role of nursePpt on expanded role of nurse
Ppt on expanded role of nurse
Nursing Path
 
Med Surg A Neuro Ppt
Med Surg A Neuro PptMed Surg A Neuro Ppt
Med Surg A Neuro Ppt
guestc323ed
 

Viewers also liked (20)

Multithreading by rj
Multithreading by rjMultithreading by rj
Multithreading by rj
 
Python Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.inPython Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.in
 
Radiology nurse kpi
Radiology nurse kpiRadiology nurse kpi
Radiology nurse kpi
 
Neonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandraNeonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandra
 
Initial assesment atls
Initial assesment  atlsInitial assesment  atls
Initial assesment atls
 
Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...
 
R2 Medscan Ppt
R2 Medscan PptR2 Medscan Ppt
R2 Medscan Ppt
 
Lung Cancer Navigation
Lung Cancer NavigationLung Cancer Navigation
Lung Cancer Navigation
 
Radiology safety (3)
Radiology safety (3)Radiology safety (3)
Radiology safety (3)
 
Level of consciousness (GCS)
Level of consciousness (GCS)Level of consciousness (GCS)
Level of consciousness (GCS)
 
LSTE 7310_DUA #3
LSTE 7310_DUA #3LSTE 7310_DUA #3
LSTE 7310_DUA #3
 
How To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate GuideHow To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate Guide
 
How to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social MediaHow to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social Media
 
Ch08 eec3
Ch08 eec3Ch08 eec3
Ch08 eec3
 
responsibility of radiographer
responsibility of radiographerresponsibility of radiographer
responsibility of radiographer
 
Fast Scan
Fast ScanFast Scan
Fast Scan
 
Ppt on expanded role of nurse
Ppt on expanded role of nursePpt on expanded role of nurse
Ppt on expanded role of nurse
 
Med Surg A Neuro Ppt
Med Surg A Neuro PptMed Surg A Neuro Ppt
Med Surg A Neuro Ppt
 
Abdominal trauma
Abdominal traumaAbdominal trauma
Abdominal trauma
 
Emergency Radiology
Emergency RadiologyEmergency Radiology
Emergency Radiology
 

Similar to Multiprocessing with python

Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
Sri Prasanna
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
Edorian
 
The state of PHPUnit
The state of PHPUnitThe state of PHPUnit
The state of PHPUnit
Edorian
 

Similar to Multiprocessing with python (20)

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
cs2110Concurrency1.ppt
cs2110Concurrency1.pptcs2110Concurrency1.ppt
cs2110Concurrency1.ppt
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
The state of PHPUnit
The state of PHPUnitThe state of PHPUnit
The state of PHPUnit
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
concurrency
concurrencyconcurrency
concurrency
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Multiprocessing with python

  • 1. Getting started with Concurrency ...using Multiprocessing and Threading PyWorks, Atlanta 2008 Jesse Noller
  • 2. Who am I? • Just another guy. • Wrote PEP 371- “Addition of the multiprocessing package” • Did a lot of the integration, now the primary point of contact. • This means I talk a lot on the internet. • Test Engineering is my focus - many of those are distributed and/or concurrent. • This stuff is not my full time job.
  • 3. What is concurrency? • Simultaneous execution • Potentially interacting tasks • Uses multi-core hardware • Includes Parallelism.
  • 4. What is a thread? • Share the memory and state of the parent. • Are “light weight” • Each gets its own stack • Do not use Inter-Process Communication or messaging. • POSIX “Threads” - pthreads.
  • 5. What are they good for? • Adding throughput and reduce latency within most applications. • Throughput: adding threads allows you to process more information faster. • Latency: adding threads to make the application react faster, such as GUI actions. • Algorithms which rely on shared data/state
  • 6. What is a process? • An independent process-of-control. • Processes are “share nothing”. • Must use some form of Inter-Process Communication to communicate/coordinate. • Processes are “big”.
  • 7. Uses for Processes. • When you don’t need to share lots of state and want a large amount of throughput. • Shared-Nothing-or-Little is “safer” then shared-everything. • Processes automatically run on multiple cores. • Easier to turn into a distributed application.
  • 8. The Difference • Threads are implicitly “share everything” - this makes the programmer have to protect (lock) anything which will be shared between threads. • Processes are “share nothing” - programmers must explicitly share any data/state - this means that the programmer is forced to think about what is being shared • Explicit is better than Implicit
  • 9. Python Threads • Python has threads, they are real, OS/Kernel level POSIX (p) threads. • When you use threading.Thread, you get a pthread.
  • 10. 2.6 changes • camelCase method names are now foo_bar() style, e.g.: active_count, current_thread, is_alive, etc. • Attributes of threads and processes have been turned into properties. • E.g.: daemon is now Thread.daemon = <bool> • For threading: these changes are optional. The old methods still exist.
  • 11. ... Python does not use “green threads”. It has real threads. OS Ones. Stop saying it doesn’t.
  • 12. ... But: Python only allows a single thread to be executing within the interpreter at once. This restriction is enforced by the GIL.
  • 13. The GIL • GIL: “Global Interpreter Lock” - this is a lock which must be acquired for a thread to enter the interpreter’s space. • Only one thread may be executing within the Python interpreter at once.
  • 14. Yeah but... • No, it is not a bug. • It is an implementation detail of CPython interpreter • Interpreter maintenance is easier. • Creation of new C extension modules easier. • (mostly) sidestepped if the app is I/O (file, socket) bound. • A threaded app which makes heavy use of sockets, won’t see a a huge GIL penalty: it is still there though. • Not going away right now.
  • 15. The other guys • Jython: No GIL, allows “free” threading by using the underlying Java threading system. • IronPython: No GIL - uses the underlying CLR, threads all run concurrently. • Stackless: Has a GIL. But has micro threads. • PyPy: Has a GIL... for now (dun dun dun)
  • 17. What is multiprocessing? • Follows the threading API closely but uses Processes and inter-process communication under the hood • Also offers distributed-computing faculties as well. • Allows the side-stepping of the GIL for CPU bound applications. • Allows for data/memory sharing. • CPython only.
  • 18. Why include it? • Covered in PEP 371, we wanted to have something which was fast and “freed” many users from the GIL restrictions. • Wanted to add to the concurrency toolbox for Python as a whole. • It is not the “final” answer. Nor is it “feature complete” • Oh, and it beats the threading module in speed.* * lies, damned lies and benchmarks
  • 19. How much faster? • It depends on the problem. • For example: number crunching, it’s significantly faster then adding threads. • Also faster in wide-finder/map-reduce situations • Process creation can be sluggish: create the workers up front.
  • 20. Example: Crunching Primes • Yes, I picked something embarrassingly parallel. • Sum all of the primes in a range of integers starting from 1,000,000 and going to 5,000,000. • Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6, completely idle, except for iTunes. • The single threaded version took so long I needed music.
  • 21. # Single threaded version import math def isprime(n): quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot; if not isinstance(n, int): raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;) if n < 2: return False if n == 2: return True max = int(math.ceil(math.sqrt(n))) i = 2 while i <= max: if n % i == 0: return False i += 1 return True def sum_primes(n): quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot; return sum([x for x in xrange(2, n) if isprime(x)]) if __name__ == quot;__main__quot;: for i in xrange(100000, 5000000, 100000): print sum_primes(i)
  • 22. # Multi Threaded version from threading import Thread from Queue import Queue, Empty ... def do_work(q): while True: try: x = q.get(block=False) print sum_primes(x) except Empty: break if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)] for t in threads: t.start() for t in threads: t.join()
  • 23. # Multiprocessing version from multiprocessing import Process, Queue from Queue import Empty ... if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)] for p in processes: p.start() for p in processes: p.join()
  • 24. Results • All results are in wall-clock time. • Single Threaded: 41 minutes, 57 seconds • Multi Threaded (8 threads): 106 minutes, 29 seconds • MultiProcessing (8 Processes): 6 minutes, 22 seconds • This is a trivial example. More benchmarks/data were included in the PEP.
  • 25. The catch. • Objects that are shared between processes must be serialize-able (pickle). • 40921 object/sec versus 24989 objects a second. • Processes are “heavy-weight”. • Processes can be slow to start. (on windows) • Supported on Linux, Solaris, Windows, OS/X - but not *BSD, and possibly others. • If you are creating and destroying lots of threads - processes are a significant impact.
  • 27. It starts with a Process • Exactly like threading: • Thread(target=func, args=(args,)).start() • Process(target=func, args=(args,)).start() • You can subclass multiprocessing.Process exactly as you would with threading.Thread.
  • 28. from threading import Thread threads = [Thread(target=do_work, args=(q,)) for i in range(8)] from multiprocessing import Process processes = [Process(target=do_work, args=(q,)) for i in range(8)]
  • 29. # Multiprocessing version # Threading version from multiprocessing import Process from threading import Thread class MyProcess(Process): class MyThread(Thread): def __init__(self): def __init__(self): Process.__init__(self) Thread.__init__(self) def run(self): def run(self): a, b = 0, 1 a, b = 0, 1 for i in range(100000): for i in range(100000): a, b = b, a + b a, b = b, a + b if __name__ == quot;__main__quot;: if __name__ == quot;__main__quot;: p = MyProcess() t = MyThread() p.start() t.start() print p.pid t.join() p.join() print p.exitcode
  • 30. Queues • multiprocessing includes 2 Queue implementations - Queue and JoinableQueue. • Queue is modeled after Queue.Queue but uses pipes underneath to transmit the data. • JoinableQueue is the same as Queue except it adds a .join() method and .task_done() ala Queue.Queue in python 2.5.
  • 31. Queue.warning • The first is that if you call .terminate or kill a process which is currently accessing a queue: that queue may become corrupted. • The second is that any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock. • Avoid this by calling Queue.cancel_join_thread() in the child process. • Or just eat everything on the results pipe before calling join (e.g. work_queue, results_queue).
  • 32. Pipes and Locks • Multiprocessing supports communication primitives. • multiprocessing.Pipe(), which returns a pair of Connection objects which represent the ends of the pipe. • The data sent on the connection must be pickle-able. • Multiprocessing has clones of all of the threading modules lock/RLock, Event, Condition and semaphore objects. • Most of these support timeout arguments, too!
  • 33. Shared Memory • Multiprocessing has a sharedctypes module. • This module allows you to create a from multiprocessing import Process ctypes object in shared memory from multiprocessing.sharedctypes import Value and share it with other processes. from ctypes import c_int def modify(x): • The sharedctypes module offers x.value += 1 some safety through the use/ x = Value(ctypes.c_int, 7) allocation of locks which prevent p = Process(target=modify, args=(x)) simultaneous accessing/ p.start() modification of the shared objects. p.join() print x.value
  • 34. Pools • One of the big “ugh” moments using threading is when you have a simple problem you simply want to pass to a pool of workers to hammer out. • Fact: There’s more thread pool implementations out there then stray cats in my neighborhood.
  • 35. Process Pools! • Multiprocessing has the Pool object. This supports the up-front creation of a number of processes and a number of methods of passing work to the workers. • Pool.apply() - this is a clone of builtin apply() function. • Pool.apply_async() - which can call a callback for you when the result is available. • Pool.map() - again, a parallel clone of the built in function. • Pool.map_async() method, which can also get a callback to ring up when the results are done. • Fact: Functional programming people love this!
  • 36. Pools raise insurance rates! from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': pool = Pool(processes=2) result = pool.apply_async(f, (10,)) print result.get() The output is: 100, note that the result returned is a AsyncResult type.
  • 37. Managers • Managers are a network and process-based way of sharing data between processes (and machines). • The primary manager type is the BaseManager - this is the basic Manager object, and can easily be subclassed to share data remotely • A proxy object is the type returned when accessing a shared object - this is a reference to the actual object being exported by the manager.
  • 38. Sharing a queue (server) # Manager Server from Queue import Empty from multiprocessing import managers, Queue _queue = Queue() def get_queue(): return _queue class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue', callable=get_queue) m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) _queue.put('What’s up remote process') s = m.get_server() s.serve_forever()
  • 39. Sharing a queue (client) # Manager Client from multiprocessing import managers class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue') m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) m.connect() remote_queue = m.get_queue() print remote_queue.get()
  • 40. Gotchas • Processes which feed into a multiprocessing.Queue will block waiting for all the objects it put there to be removed. • Data must be pickle-able: this means some objects (for instance, GUI ones) can not be shared. • Arguments to Proxies (managers) methods must be pickle-able as well. • While it supports locking/semaphores: using those means you’re sharing something you may not need to be sharing.
  • 41. In Closing • Multiple processes are not mutually exclusive with using Threads • Multiprocessing offers a simple and known API • This lowers the barrier of entry significantly • Side steps the GIL • In addition to “just processes” multiprocessing offers the start of grid-computing utilities