SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Python threads: Dive into GIL!

       PyCon India 2011
       Pune, Sept 16-18



Vishal Kanaujia & Chetan Giridhar
Python: A multithreading example
Setting up the context!!
                                                      Hmm…My threads
                                                   should be twice as faster
                                                        on dual cores!




         Execution Time                                Python v2.7             Execution Time
150

100                                                     Single Core                 74 s
50                                Execution Time

 0
                                                        Dual Core                  116 s
      Single Core    Dual Core




                          57 % dip in Execution Time on dual core!!
Getting into the
problem space!!
                                Python
      Getting in to the Problem Space!!
                                  v2.7

          GIL
Python Threads
• Real system threads (POSIX/ Windows
  threads)
• Python VM has no intelligence of thread
  management
  – No thread priorities, pre-emption
• Native operative system supervises thread
  scheduling
• Python interpreter just does the per-thread
  bookkeeping.
What’s wrong with Py threads?

• Each ‘running’ thread requires exclusive
  access to data structures in Python interpreter
• Global interpreter lock (GIL) provides the
  synchronization (bookkeeping)
• GIL is necessary, mainly because CPython's
  memory management is not thread-safe.
GIL: Code Details
• A thread create request in Python is just a
  pthread_create() call
• The function Py_Initialize() creates the GIL
• GIL is simply a synchronization primitive, and can
  be implemented with a semaphore/ mutex.


• A “runnable” thread acquires this lock and start
  execution
GIL Management
• How does Python manages GIL?
  – Python interpreter regularly performs a
    check on the running thread
  – Accomplishes thread switching and signal handling
• What is a “check”?
    A counter of ticks; ticks decrement as a thread runs
    A tick maps to a Python VM’s byte-code instructions
    Check interval can be set with sys.setcheckinterval
    (interval)
    Check dictate CPU time-slice available to a thread
GIL Management: Implementation
• Involves two global varibales:
        • PyAPI_DATA(volatile int) _Py_Ticker;
        • PyAPI_DATA(int) _Py_CheckInterval;


• As soon as ticks reach zero:
   •   Refill the ticker
   •   active thread releases the GIL
   •   Signals sleeping threads to wake up
   •   Everyone competes for GIL
Two CPU bound threads on single core machine
                          GIL                             waiting for
thread1                released                              GIL


           Running                           Suspended



                           Signals thread2

             waiting                             GIL
thread2                                        acquired
             for GIL


          Suspended      Wakeup                   Running



time                    check1                                          check2
GIL impact
• There is considerable time lag with
   – Communication (Signaling)
   – Thread wake-up
   – GIL acquisition
• GIL Management: Independent of host/native OS
  scheduling
• Result                         Try Ctrl + C. Does it stop
                                              execution?
   – Significant overhead
   – Thread waits if GIL in unavailable
   – Threads run sequentially, rather than concurrently
Curious case of multicore system
                                               thread1, core0


thread

                                               thread2, core1




         time




     • Conflicting goals of OS scheduler and Python interpreter
     • Host OS can schedule threads concurrently on multi-core
     • GIL battle
The ‘Priority inversion’
• In a [CPU, I/O]-bound mixed application, I/O
  bound thread may starve!
• “cache-hotness” may influence the new GIL
  owner; usually the recent owner!
• Preferring CPU thread over I/O thread
• Python presents a priority inversion on multi-
  core systems.
Understanding the
   new story!!
                               Python
     Getting in to the Problem Space!!
                                 v3.2

         GIL
New GIL: Python v3.2
• Regular “check” are discontinued
• We have new time-out mechanism.
      • Default time-out= 5ms
      • Configurable through sys.setswitchinterval()
• For every time-out, the current GIL holder is forced to
  release GIL
• It then signals the other waiting threads
• Waits for a signal from new GIL owner
  (acknowledgement).
• A sleeping thread wakes up, acquires the GIL, and
  signals the last owner.
Curious Case of Multicore: Python v3.2
CPU Thread
  core0                      GIL                       Waiting for
                          released                      the GIL


              Running                Wait       Suspended


              Time Out

                waiting                       GIL
                for GIL                     acquired


             Suspended          Wakeup           Running




                                                                     time
CPU Thread
  core1
Positive impact with new GIL
• Better GIL arbitration
     • Ensures that a thread runs only for 5ms
• Less context switching and fewer signals
• Multicore perspective: GIL battle eliminated!
• More responsive threads (fair scheduling)
• All iz well☺
I/O threads in Python
• An interesting optimization by interpreter
  – I/O calls are assumed blocking
• Python I/O extensively exercise this
  optimization with file, socket ops (e.g. read,
  write, send, recv calls)
  ./Python3.2.1/Include/ceval.h
      Py_BEGIN_ALLOW_THREADS
          Do some blocking I/O operation ...
      Py_END_ALLOW_THREADS
• I/O thread always releases the GIL
Convoy effect: Fallout of I/O
             optimization
• When an I/O thread releases the GIL, another
  ‘runnable’ CPU bound thread can acquire it
  (remember we are on multiple cores).
• It leaves the I/O thread waiting for another
  time-out (default: 5ms)!
• Once CPU thread releases GIL, I/O thread
  acquires and releases it again
• This cycle goes on => performance suffers
Convoy “in” effect
    Convoy effect- observed in an application comprising I/O-
    bound and CPU-bound threads
                                                                               GIL
 I/O Thread                GIL released                  GIL acquired       released
    core0


             Running              Wait       Suspended    Running         Wait



             waiting for
                                          GIL acquired
                GIL



                 Suspended                   Running      Wait          Suspended
CPU Thread
  core1
                                            Time Out                         time
Performance measurements!
• Curious to know how convoy effect translates
  into performance numbers
• We performed following tests with Python3.2:
     • An application with a CPU and a I/O thread
     • Executed on a dual core machine
     • CPU thread spends less than few seconds
       (<10s)!
       I/O thread with CPU thread   I/O thread without CPU thread
              97 seconds                     23 seconds
Comparing: Python 2.7 & Python 3.2
      Python v2.7                Execution Time                    Python v3.2            Execution Time
      Single Core                     74 s                         Single Core                 55 s
       Dual Core                     116 s                          Dual Core                  65 s


         Execution Time – Single Core                                   Execution Time – Dual Core
 80                                                         150
 60
                                                            100
 40
                                      Execution Time         50                              Execution Time
 20
 0                                                             0
          v2.7            v3.2                                          v2.7       v3.2


          Execution Time – Python v3.2
  66
  64                                                                                Performance
  62
  60
                                                                                      dip still
  58
                                                       Execution Time
                                                                                    observed in
  56
  54                                                                                dual cores
  52
  50
            Single Core             Dual Core
Getting into the
problem space!!
                                 Python
      Getting in to the Problem v2.7 / v3.2
                                Space!!



          GIL
                                   Solution
                                   space!!
GIL free world: Jython
• Jython is free of GIL ☺
• It can fully exploit multiple cores, as per our
  experiments
• Experiments with Jython2.5
  – Run with two CPU threads in tandem
              Jython2.5    Execution time
             Single core        44 s
              Dual core         25 s

• Experiment shows performance improvement on a
  multi-core system
Avoiding GIL impact with multiprocessing
• multiprocessing — Process-based “threading” interface
• “multiprocessing” module spawns a new Python interpreter
  instance for a process.
• Each process is independent, and GIL is irrelevant;
    – Utilizes multiple cores better than threads.
    – Shares API with “threading” module.

  Python v2.7      Single Core   Dual Core   120

                                             100
   threading          76 s         114 s
                                             80
 multiprocessing      72 s         43 s
                                             60
                                                                             threading
                                             40                              multiprocessing
                                             20

 Cool! 40 % improvement in Execution          0

         Time on dual core!! ☺                     Single Core   Dual Core
Conclusion
• Multi-core systems are becoming ubiquitous
• Python applications should exploit this
  abundant power
• CPython inherently suffers the GIL limitation
• An intelligent awareness of Python interpreter
  behavior is helpful in developing multi-
  threaded applications
• Understand and use ☺
Questions



  Thank you for your time and attention ☺



• Please share your feedback/ comments/ suggestions to us at:
• cjgiridhar@gmail.com ,        http://technobeans.com
• vishalkanaujia@gmail.com,     http://freethreads.wordpress.com
References
• Understanding the Python GIL, http://dabeaz.com/talks.html
• GlobalInterpreterLock,
  http://wiki.python.org/moin/GlobalInterpreterLock
• Thread State and the Global Interpreter Lock,
  http://docs.python.org/c-api/init.html#threads
• Python v3.2.2 and v2.7.2 documentation,
  http://docs.python.org/
• Concurrency and Python, http://drdobbs.com/open-
  source/206103078?pgno=3
Backup slides
Python: GIL
• A thread needs GIL before updating Python objects,
  calling C/Python API functions
• Concurrency is emulated with regular ‘checks’ to switch
  threads
• Applicable to only CPU bound thread
• A blocking I/O operation implies relinquishing the GIL
   – ./Python2.7.5/Include/ceval.h
       Py_BEGIN_ALLOW_THREADS
           Do some blocking I/O operation ...
       Py_END_ALLOW_THREADS
• Python file I/O extensively exercise this optimization
GIL: Internals
• The function Py_Initialize() creates the GIL
• A thread create request in Python is just a
  pthread_create() call
• ../Python/ceval.c
• static PyThread_type_lock interpreter_lock = 0;
  /* This is the GIL */
• o) thread_PyThread_start_new_thread: we call it
  for "each" user defined thread.
• calls PyEval_InitThreads() ->
  PyThread_acquire_lock() {}
GIL: in action
• Each CPU bound thread requires GIL
• ‘ticks count’ determine duration of GIL hold
• new_threadstate() -> tick_counter
• We keep a list of Python threads and each
  thread-state has its tick_counter value
• As soon as tick decrements to zero, the
  thread release the GIL.
GIL: Details
thread_PyThread_start_new_thread() ->
void PyEval_InitThreads(void)
{
  if (interpreter_lock)
     return;
  interpreter_lock = PyThread_allocate_lock();
  PyThread_acquire_lock(interpreter_lock, 1);
  main_thread = PyThread_get_thread_ident();
}
Convoy effect: Python v2?
• Convoy effect holds true for Python v2 also
• The smaller interval of ‘check’ saves the day!
  – I/O threads don’t have to wait for a longer time (5
    m) for CPU threads to finish
  – Should choose the setswitchinterval() wisely
• The effect is not so visible in Python v2.0
Stackless Python
• A different way of creating threads:
  Microthreads!
• No improvement from multi-core perspective
• Round-robin scheduling for “tasklets”
• Sequential execution

Mais conteúdo relacionado

Semelhante a PyCon India 2011: Python Threads: Dive into GIL!

SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technologySZ Lin
 
Super Fast Gevent Introduction
Super Fast Gevent IntroductionSuper Fast Gevent Introduction
Super Fast Gevent IntroductionWalter Liu
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...OPNFV
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingJeff Larkin
 
Overview of python misec - 2-2012
Overview of python   misec - 2-2012Overview of python   misec - 2-2012
Overview of python misec - 2-2012Tazdrumm3r
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in osGenchiLu1
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ihor Banadiga
 
Github Enterprise じゃなくてもいいじゃん
Github Enterprise じゃなくてもいいじゃんGithub Enterprise じゃなくてもいいじゃん
Github Enterprise じゃなくてもいいじゃんTakafumi ONAKA
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]Takuya ASADA
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
 
Continuous Delivery the Hard Way with Kubernetes
Continuous Delivery the Hard Way with Kubernetes Continuous Delivery the Hard Way with Kubernetes
Continuous Delivery the Hard Way with Kubernetes Weaveworks
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC DVClub
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 

Semelhante a PyCon India 2011: Python Threads: Dive into GIL! (20)

SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technology
 
Optimizing Java Notes
Optimizing Java NotesOptimizing Java Notes
Optimizing Java Notes
 
Super Fast Gevent Introduction
Super Fast Gevent IntroductionSuper Fast Gevent Introduction
Super Fast Gevent Introduction
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Overview of python misec - 2-2012
Overview of python   misec - 2-2012Overview of python   misec - 2-2012
Overview of python misec - 2-2012
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in os
 
Preempt_rt realtime patch
Preempt_rt realtime patchPreempt_rt realtime patch
Preempt_rt realtime patch
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
 
Github Enterprise じゃなくてもいいじゃん
Github Enterprise じゃなくてもいいじゃんGithub Enterprise じゃなくてもいいじゃん
Github Enterprise じゃなくてもいいじゃん
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 
Continuous Delivery the Hard Way with Kubernetes
Continuous Delivery the Hard Way with Kubernetes Continuous Delivery the Hard Way with Kubernetes
Continuous Delivery the Hard Way with Kubernetes
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC Approaches for Power Management Verification of SOC
Approaches for Power Management Verification of SOC
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
 

Mais de Chetan Giridhar

Rapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesRapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesChetan Giridhar
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and pythonChetan Giridhar
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSFuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSChetan Giridhar
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python Chetan Giridhar
 
Testers in product development code review phase
Testers in product development   code review phaseTesters in product development   code review phase
Testers in product development code review phaseChetan Giridhar
 
Design patterns in python v0.1
Design patterns in python v0.1Design patterns in python v0.1
Design patterns in python v0.1Chetan Giridhar
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programmingChetan Giridhar
 

Mais de Chetan Giridhar (8)

Rapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesRapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websites
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSFuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FS
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python
 
Testers in product development code review phase
Testers in product development   code review phaseTesters in product development   code review phase
Testers in product development code review phase
 
Design patterns in python v0.1
Design patterns in python v0.1Design patterns in python v0.1
Design patterns in python v0.1
 
Tutorial on-python-programming
Tutorial on-python-programmingTutorial on-python-programming
Tutorial on-python-programming
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

PyCon India 2011: Python Threads: Dive into GIL!

  • 1. Python threads: Dive into GIL! PyCon India 2011 Pune, Sept 16-18 Vishal Kanaujia & Chetan Giridhar
  • 3. Setting up the context!! Hmm…My threads should be twice as faster on dual cores! Execution Time Python v2.7 Execution Time 150 100 Single Core 74 s 50 Execution Time 0 Dual Core 116 s Single Core Dual Core 57 % dip in Execution Time on dual core!!
  • 4. Getting into the problem space!! Python Getting in to the Problem Space!! v2.7 GIL
  • 5. Python Threads • Real system threads (POSIX/ Windows threads) • Python VM has no intelligence of thread management – No thread priorities, pre-emption • Native operative system supervises thread scheduling • Python interpreter just does the per-thread bookkeeping.
  • 6. What’s wrong with Py threads? • Each ‘running’ thread requires exclusive access to data structures in Python interpreter • Global interpreter lock (GIL) provides the synchronization (bookkeeping) • GIL is necessary, mainly because CPython's memory management is not thread-safe.
  • 7. GIL: Code Details • A thread create request in Python is just a pthread_create() call • The function Py_Initialize() creates the GIL • GIL is simply a synchronization primitive, and can be implemented with a semaphore/ mutex. • A “runnable” thread acquires this lock and start execution
  • 8. GIL Management • How does Python manages GIL? – Python interpreter regularly performs a check on the running thread – Accomplishes thread switching and signal handling • What is a “check”? A counter of ticks; ticks decrement as a thread runs A tick maps to a Python VM’s byte-code instructions Check interval can be set with sys.setcheckinterval (interval) Check dictate CPU time-slice available to a thread
  • 9. GIL Management: Implementation • Involves two global varibales: • PyAPI_DATA(volatile int) _Py_Ticker; • PyAPI_DATA(int) _Py_CheckInterval; • As soon as ticks reach zero: • Refill the ticker • active thread releases the GIL • Signals sleeping threads to wake up • Everyone competes for GIL
  • 10. Two CPU bound threads on single core machine GIL waiting for thread1 released GIL Running Suspended Signals thread2 waiting GIL thread2 acquired for GIL Suspended Wakeup Running time check1 check2
  • 11. GIL impact • There is considerable time lag with – Communication (Signaling) – Thread wake-up – GIL acquisition • GIL Management: Independent of host/native OS scheduling • Result Try Ctrl + C. Does it stop execution? – Significant overhead – Thread waits if GIL in unavailable – Threads run sequentially, rather than concurrently
  • 12. Curious case of multicore system thread1, core0 thread thread2, core1 time • Conflicting goals of OS scheduler and Python interpreter • Host OS can schedule threads concurrently on multi-core • GIL battle
  • 13. The ‘Priority inversion’ • In a [CPU, I/O]-bound mixed application, I/O bound thread may starve! • “cache-hotness” may influence the new GIL owner; usually the recent owner! • Preferring CPU thread over I/O thread • Python presents a priority inversion on multi- core systems.
  • 14. Understanding the new story!! Python Getting in to the Problem Space!! v3.2 GIL
  • 15. New GIL: Python v3.2 • Regular “check” are discontinued • We have new time-out mechanism. • Default time-out= 5ms • Configurable through sys.setswitchinterval() • For every time-out, the current GIL holder is forced to release GIL • It then signals the other waiting threads • Waits for a signal from new GIL owner (acknowledgement). • A sleeping thread wakes up, acquires the GIL, and signals the last owner.
  • 16. Curious Case of Multicore: Python v3.2 CPU Thread core0 GIL Waiting for released the GIL Running Wait Suspended Time Out waiting GIL for GIL acquired Suspended Wakeup Running time CPU Thread core1
  • 17. Positive impact with new GIL • Better GIL arbitration • Ensures that a thread runs only for 5ms • Less context switching and fewer signals • Multicore perspective: GIL battle eliminated! • More responsive threads (fair scheduling) • All iz well☺
  • 18. I/O threads in Python • An interesting optimization by interpreter – I/O calls are assumed blocking • Python I/O extensively exercise this optimization with file, socket ops (e.g. read, write, send, recv calls) ./Python3.2.1/Include/ceval.h Py_BEGIN_ALLOW_THREADS Do some blocking I/O operation ... Py_END_ALLOW_THREADS • I/O thread always releases the GIL
  • 19. Convoy effect: Fallout of I/O optimization • When an I/O thread releases the GIL, another ‘runnable’ CPU bound thread can acquire it (remember we are on multiple cores). • It leaves the I/O thread waiting for another time-out (default: 5ms)! • Once CPU thread releases GIL, I/O thread acquires and releases it again • This cycle goes on => performance suffers
  • 20. Convoy “in” effect Convoy effect- observed in an application comprising I/O- bound and CPU-bound threads GIL I/O Thread GIL released GIL acquired released core0 Running Wait Suspended Running Wait waiting for GIL acquired GIL Suspended Running Wait Suspended CPU Thread core1 Time Out time
  • 21. Performance measurements! • Curious to know how convoy effect translates into performance numbers • We performed following tests with Python3.2: • An application with a CPU and a I/O thread • Executed on a dual core machine • CPU thread spends less than few seconds (<10s)! I/O thread with CPU thread I/O thread without CPU thread 97 seconds 23 seconds
  • 22. Comparing: Python 2.7 & Python 3.2 Python v2.7 Execution Time Python v3.2 Execution Time Single Core 74 s Single Core 55 s Dual Core 116 s Dual Core 65 s Execution Time – Single Core Execution Time – Dual Core 80 150 60 100 40 Execution Time 50 Execution Time 20 0 0 v2.7 v3.2 v2.7 v3.2 Execution Time – Python v3.2 66 64 Performance 62 60 dip still 58 Execution Time observed in 56 54 dual cores 52 50 Single Core Dual Core
  • 23. Getting into the problem space!! Python Getting in to the Problem v2.7 / v3.2 Space!! GIL Solution space!!
  • 24. GIL free world: Jython • Jython is free of GIL ☺ • It can fully exploit multiple cores, as per our experiments • Experiments with Jython2.5 – Run with two CPU threads in tandem Jython2.5 Execution time Single core 44 s Dual core 25 s • Experiment shows performance improvement on a multi-core system
  • 25. Avoiding GIL impact with multiprocessing • multiprocessing — Process-based “threading” interface • “multiprocessing” module spawns a new Python interpreter instance for a process. • Each process is independent, and GIL is irrelevant; – Utilizes multiple cores better than threads. – Shares API with “threading” module. Python v2.7 Single Core Dual Core 120 100 threading 76 s 114 s 80 multiprocessing 72 s 43 s 60 threading 40 multiprocessing 20 Cool! 40 % improvement in Execution 0 Time on dual core!! ☺ Single Core Dual Core
  • 26. Conclusion • Multi-core systems are becoming ubiquitous • Python applications should exploit this abundant power • CPython inherently suffers the GIL limitation • An intelligent awareness of Python interpreter behavior is helpful in developing multi- threaded applications • Understand and use ☺
  • 27. Questions Thank you for your time and attention ☺ • Please share your feedback/ comments/ suggestions to us at: • cjgiridhar@gmail.com , http://technobeans.com • vishalkanaujia@gmail.com, http://freethreads.wordpress.com
  • 28. References • Understanding the Python GIL, http://dabeaz.com/talks.html • GlobalInterpreterLock, http://wiki.python.org/moin/GlobalInterpreterLock • Thread State and the Global Interpreter Lock, http://docs.python.org/c-api/init.html#threads • Python v3.2.2 and v2.7.2 documentation, http://docs.python.org/ • Concurrency and Python, http://drdobbs.com/open- source/206103078?pgno=3
  • 30. Python: GIL • A thread needs GIL before updating Python objects, calling C/Python API functions • Concurrency is emulated with regular ‘checks’ to switch threads • Applicable to only CPU bound thread • A blocking I/O operation implies relinquishing the GIL – ./Python2.7.5/Include/ceval.h Py_BEGIN_ALLOW_THREADS Do some blocking I/O operation ... Py_END_ALLOW_THREADS • Python file I/O extensively exercise this optimization
  • 31. GIL: Internals • The function Py_Initialize() creates the GIL • A thread create request in Python is just a pthread_create() call • ../Python/ceval.c • static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */ • o) thread_PyThread_start_new_thread: we call it for "each" user defined thread. • calls PyEval_InitThreads() -> PyThread_acquire_lock() {}
  • 32. GIL: in action • Each CPU bound thread requires GIL • ‘ticks count’ determine duration of GIL hold • new_threadstate() -> tick_counter • We keep a list of Python threads and each thread-state has its tick_counter value • As soon as tick decrements to zero, the thread release the GIL.
  • 33. GIL: Details thread_PyThread_start_new_thread() -> void PyEval_InitThreads(void) { if (interpreter_lock) return; interpreter_lock = PyThread_allocate_lock(); PyThread_acquire_lock(interpreter_lock, 1); main_thread = PyThread_get_thread_ident(); }
  • 34. Convoy effect: Python v2? • Convoy effect holds true for Python v2 also • The smaller interval of ‘check’ saves the day! – I/O threads don’t have to wait for a longer time (5 m) for CPU threads to finish – Should choose the setswitchinterval() wisely • The effect is not so visible in Python v2.0
  • 35. Stackless Python • A different way of creating threads: Microthreads! • No improvement from multi-core perspective • Round-robin scheduling for “tasklets” • Sequential execution