SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Threads Implementations




CS 167             VI–1   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                      VI–1
Outline

         • Threads implementations
           – the one-level model
           – the variable-weight processes model
           – the two-level model (one kernel thread)
           – the two-level model (multiple kernel threads)
           – the scheduler-activations model
           – performance




CS 167                          VI–2      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                      VI–2
Implementing Threads

                    • Components
                      – chores: the work that is to be done
                      – processors: the active agents
                      – threads: the execution context




           CS 167                           VI–3       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Before we discuss how threads are implemented, let’s introduce some terms that will be
helpful in discussing implementations. It’s often convenient to separate the notion of the
work that is being done (e.g., the computation that is to be performed) from the notion of
some active agent who is doing the work. So we call the former a chore, and the latter a
processor. Examples of the former include computing the value of π, looking up an entry in a
database, and redrawing a window.
  To model systems in sufficient detail to discuss performance, we use a further
abstraction—contexts. When a processor executes code, its registers contain certain values,
some of which define such things as the stack. These values must be loaded into a processor
so that the processor can execute the code associated with handling a chore; conversely, this
information must be saved if the processor is to be switched to executing the code associated
with some other chore. We define a thread (or thread of control) to be this information; in
other words, it is the execution context.




                                                                                                                   VI–3
Scheduling

                     • Chores on threads
                       – event loops
                     • Threads on processors
                       – time-division multiplexing
                           - explicit thread switching
                           - time slicing




            CS 167                            VI–4       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  An important aspect of multithreading is scheduling threads on processors. One might
think that the most important aspect is the schedule: when is a particular thread chosen for
execution by a processor? This is certainly not unimportant, but what is perhaps more
crucial is where the scheduling takes place and what sort of context is being scheduled.
  The simplest system would be to handle a single chore in the context of a single thread
being executed by a single processor. This would be a rather limited system: it would
perform that chore and nothing else. A simple multiprocessor system would consist of
multiple chores each handled in the context of a separate thread, each being executed by a
separate processor.
  More realistic systems handle many chores, both sequentially and concurrently. This
requires some sort of multiplexing mechanism. One might handle multiple chores with a
single thread: this is the approach used in event-handling systems, in which a thread is
used to support an event loop. In response to an event, the thread is assigned an associated
chore (and the processor is directed to handle that chore).
  Time-division multiplexing allows chores to be handled concurrently and can be done by
dividing up a processor’s time among a number of threads. The mechanism for doing this
might be time-slicing: assigning the processor to a thread for a certain amount of time before
assigning it to another thread, or explicit: code in the chore releases the processor so that it
may switch to another thread. In either case, a scheduler is employed to determine which
threads should be assigned processors.




                                                                                                                     VI–4
Multiplexing Processors


                                                            Blocked
                                     Runnable
                                                                                      Keyboard

                     Running
                                     Runnable
                                                            Blocked
                     Running                                                          Disk
                                     Runnable




           CS 167                           VI–5      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  To be a bit more precise about scheduling, let’s define some more (standard) terms.
Threads are in either a blocked state or a runnable state: in the former they cannot be
assigned a processor, in the latter they can. A scheduler determines which runnable threads
should be assigned processors. Runnable threads that have been assigned activities are
called running threads.




                                                                                                                  VI–5
One-Level Model


                                                                   User


                                                                   Kernel


                                                                   Processors




           CS 167                            VI–6       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  In most systems there are actually two components of the execution context: the user
context and the kernel context. The former is for use when an activity is executing user code;
the latter is for use when the activity is executing kernel code (on behalf of the chore). How
these contexts are manipulated is one of the more crucial aspects of a threads
implementation.
  The conceptually simplest approach is what is known as the one-level model: each thread
consists of both contexts. Thus a thread is scheduled to an activity and the activity can
switch back and forth between the two types of contexts. A single scheduler in the kernel can
handle all the multiplexing duties. The threading implementation in Windows is (mostly)
done this way.




                                                                                                                    VI–6
Variable-Weight Processes

                     • Variant of one-level model
                     • Portions of parent process selectively copied
                       into or shared with child process
                     • Children created using clone system call




            CS 167                            VI–7       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Unlike most other Unix systems, which make a distinction between processes and threads,
allowing multithreaded processes, Linux maintains the one-thread-per-process approach.
However, so that we can have multiple threads sharing an address space, Linux supports the
clone system call, a variant of fork, via which a new process can be created that shares
resources (in particular, its address space) with the parent. The result is a variant of the one-
level model.
  This approach is not unique to Linux. It’s used in SGI’s IRIX and was first discussed in
early ’89, when it was known as variable-weight processes. (See “Variable-Weight Processes
with Flexible Shared Resources,” by Z. Aral, J. Bloom, T. Doeppner, I. Gertner, A.
Langerman, G. Schaffer, Proceedings of Winter 1989 USENIX Association Meeting.)




                                                                                                                     VI–7
Cloning
                                           Signal
                                            Info
                     Parent                                                         Child
                                            Files:
                                        file-descriptor
                                              table


                                             FS:
                                          root, cwd,
                                            umask


                                      Virtual Memory


            CS 167                           VI–8         Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  As implemented in Linux, a process may be created with the clone system call (in addition
to using the fork system call). One can specify, for each of the resources shown in the slide,
whether a copy is made for the child or the child shares the resource with the parent. Only
two cases are generally used: everything is copied (equivalent to fork) or everything is shared
(creating what we ordinarily call a thread, though the “thread” has a separate process ID).




                                                                                                                      VI–8
Linux Threads
                                           (pre 2.6)


                        Initial
                       Thread


                                                                             Manager
                                                                             Thread

                      Other                      Pipe
                     Thread
                         Other
                        Thread
                           Other
                           Thread




            CS 167                              VI–9       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Building a POSIX-threads implementation on top of Linux’s variable-weight processes
requires some work. What’s discussed here is the approach used prior to Linux 2.6. Some
information    about     the     threads    implementation       of   2.6  can    be   found      at
http://people.redhat.com/drepper/nptl-design.pdf.
   Each thread is, of course, a process; all threads of the same computation share the same
address space, open files, and signal handlers. One might expect that the implementation of
pthread_create would be a simple call to clone. This, unfortunately, wouldn’t allow an easy
implementation of operations such as pthread_join: a Unix process may wait only for its
children to terminate; a POSIX thread can join with any other joinable thread. Furthermore,
if a Unix process terminates, its children are inherited by the init process (process number
1). So that pthread_join can be implemented without undue complexity, a special manager
thread (actually a process) is the parent/creator of all threads other than the initial thread.
This manager thread handles thread (process) termination via the wait4 system call and thus
provides a means for implementing pthread_join. So, when any thread invokes
pthread_create or pthread_join, it sends a request to the manager via a pipe and waits for a
response. The manager handles the request and wakes up the caller when appropriate.
   The state of a mutex is represented by a bit. If there are no competitors for locking a
mutex, a thread simply sets the bit with a compare-and-swap instruction (allowing atomic
testing and setting of the mutex’s state bit). If a thread must wait for a mutex to be unlocked,
it blocks using a sigsuspend system call, after queuing itself to a queue headed by the
mutex. A thread unlocking a mutex wakes up the first waiting thread by sending it a Unix
signal (via the kill system call). The wait queue for condition variables is implemented in a
similar fashion.
   On multiprocessors, for mutexes that are neither recursive nor error-checking, waiting is
implemented with an adaptive strategy: under the assumption that mutexes are typically not
held for a long period of time, a thread attempting to lock a locked mutex “spins” on it for up
to a short period of time, i.e., it repeatedly tests the state of the mutex in hopes that it will be
unlocked. If the mutex does not become available after the maximum number of tests, then
the thread finally blocks by queuing itself and calling sigsuspend.



                                                                                                                       VI–9
Two-Level Model
                                 One Kernel Thread



                                                                   User


                                                                   Kernel


                                                                   Processors




           CS 167                            VI–10      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Another approach, the two-level model, is to represent the two contexts as separate types
of threads: user threads and kernel threads. Kernel threads become “virtual activities” upon
which user threads are scheduled. Thus two schedulers are used: kernel threads are
multiplexed on activities by a kernel scheduler; user threads are multiplexed on kernel
threads by a user-level scheduler. An extreme case of this model is to use only a single
kernel thread per process (perhaps because this is all the operating system supports). The
Unix implementation of the Netscape web browser was based on this model (recent Solaris
versions use the native Solaris implementation of threads), as were early Unix threads
implementations. There are two obvious disadvantages of this approach, both resulting from
the restriction of a single kernel thread per process: only one activity can be used at a time
(thus a single process cannot take advantage of a multiprocessor) and if the kernel thread is
blocked (e.g., as part of an I/O operation), no user thread can run.




                                                                                                                    VI–10
Two-Level Model:
                             Multiple Kernel Threads



                                                                         User


                                                                         Kernel


                                                                         Processors




           CS 167                            VI–11      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  A more elaborate use of the two-level model is to allow multiple kernel threads per process.
This deals with both the disadvantages described above and is the basis of the Solaris
implementation of threading. It has some performance issues; in addition the notion of
multiplexing user threads onto kernel threads is very different from the notion of
multiplexing threads onto activities—there is no direct control over when a chore is actually
run by an activity. From an application’s perspective, it is sometimes desired to have direct
control over which chores are currently being run.




                                                                                                                    VI–11
Scheduler Activations




                                                                                                     User
                                                                                                    Kernel




            CS 167                             VI–12       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  A third approach, known historically as the scheduler activations model, is that threads
represent user contexts, with kernel contexts supplied when needed (i.e., not as a kernel
thread, as in the two-level model). User threads are multiplexed on activities by a user-level
scheduler, which communicates to the kernel the number of activities needed (i.e., the
number of ready user threads). The kernel multiplexes entire processes on activities—it
determines how many activities to give each process. This model, which is the basis for the
Digital-Unix (now True64 Unix) threading package, certainly gives direct control to the user
application over which chores are being run.
  To make some sense of this, let’s work through an example. A process starts up,
containing a single user execution context (and user thread) and a kernel execution context
(and kernel thread). Following the dictates of its scheduling policy, the kernel scheduler
assigns a processor to the process. If the kernel thread blocks, the process implicitly
relinquishes the processor to the kernel scheduler, and gets it back once it unblocks.
  Suppose that the user program creates a new thread (and its associated user execution
context). If actual parallelism is desired, code in the user-level library notifies the kernel that
two processors are desired. When a processor becomes available, the kernel creates a new
kernel execution context; using the newly available processor running in the new kernel
execution context, it places an upcall (going from system code to user code, unlike a system
call, which goes from user code to system code) to the user-level library, effectively giving it
the processor. The library code then assigns this processor to the new thread and user
execution context.




                                                                                                                       VI–12
Scheduler Activations
                                       (continued)




                                                                                                  User
                                                                                                 Kernel




           CS 167                            VI–13      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The user application might then create another thread. It might also ask for another
processor, but this machine only has two. However, let’s say that one of its other threads
(thread 1) blocks on a page fault. The kernel, getting the processor back, creates a new
kernel execution context and places another upcall to our process, telling it two things:
       • The thread using kernel execution context 1 has blocked, and thus it has lost its
       processor (processor 1).
        • Here is processor 1, can you use it?
  In our case the process will assign the processor to thread 3. But soon the page being
waited for by thread 1 becomes available. The kernel should notify the process of this event,
but, of course, it requires a processor to do this. So it uses one of the processors already
assigned to the process, the one the process has assigned to thread 2. The process is now
notified of the following two events:
       • The thread using kernel execution context 1 has unblocked (i.e., it would be
       running, if only it had a processor).
        • I’m telling you this using processor 2, which I’ve taken from the thread that was
        using kernel execution context 2.
  The library now must decide what to do with the processor that has been handed to it. It
could give it back to thread 2, leaving thread 1 unblocked, but not running, in the kernel, it
could continue the suspension of thread 2 and give the processor to thread 1, or it could
decide that both threads 1 and 2 should be running now and thus suspend thread 3, give its
processor to thread 1, and give thread 2 its processor back.




                                                                                                                    VI–13
Scheduler Activations
                                     (still continued)




                                                                                                  User
                                                                                                 Kernel




           CS 167                            VI–14      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  At some point the kernel is going to decide that the process has had one or both processors
long enough (e.g., a time slice has expired). So it yanks one of the processors away and,
using the other processor, makes an upcall conveying the following news:
       • I’ve taken processor 1.
        • I’m telling you this using processor 2.
  The library learns that it now has only one processor, but with this knowledge it can assign
the processor to the most deserving thread.




                                                                                                                    VI–14
Performance

                     • One-level model
                       – operations on threads are expensive (require
                         system calls)
                       – example: mutual exclusion in Windows
                          - critical section implemented partly in user
                            mode
                              • success case in user code
                          - mutex implemented completely in kernel
                          - success case is 20 times faster for critical
                            section than for mutex




            CS 167                             VI–15       Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   The one-level model is the most straightforward—unlike the others, there is but a single
scheduler. This scheduler resides in the kernel; hence the bulk of the data structures and
code required to represent and manipulate threads is in the kernel (though not all, as we
discuss below). Thus many thread operations, such as synchronization, thread creation and
destruction, involve calls to kernel code—system calls. Since in most architectures such calls
(from user code) are significantly more expensive than calls to user code, threading
implementations based on the one-level model are prone to high operation costs.
   To illustrate the performance penalty incurred when performing a system call, we
measured the costs of performing an operation both in user space and in the kernel in
Windows NT 4.0. The operation, waiting for an object to become unlocked and then locking
it, is performed frequently by numerous applications and has a highly optimized
implementation, especially for the case we exercised, in which the object is not previously
locked. NT provides two constructs for doing this—the critical section and the mutex. The
former is implemented partly in user code and partly in kernel code. If the object in question
(represented by the critical section) is not locked, the critical section operates strictly in user
mode. The mutex is implemented entirely in kernel code: regardless of its state, operations
on it involve system calls. Our measurements show that requests to lock, then unlock a
mutex take twenty times longer than to lock and unlock a critical section when the mutex
and critical section are not already locked. (Operations on the two take the same amount of
time when the mutex and critical section are locked by another thread.)




                                                                                                                       VI–15
Performance
                                        (continued)

                     • Two-level model (good news)
                       – many operations on threads are done strictly
                         in user code: no system calls




            CS 167                            VI–16      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   The two-level model makes it possible to eliminate some of the overhead of the one-level
model. Since user threads are multiplexed on kernel threads, the thread that is directly
manipulated by application code is the user thread, implemented entirely in user-level code.
As long as operations on user threads do not involve operations on kernel threads, all
execution takes place at user level and thus the cost of calling kernel code is avoided. The
trick, of course, is to avoid operations on kernel threads.
   The user-level library maintains a ready list of runnable user threads. When a running
user thread must block for synchronization, it is put on a wait queue and its kernel thread
switches to run the user thread at the head of the ready list. If the list is empty, the thread
executes a system call to cause it to block. When one user thread unblocks another, the
latter is moved to the end of the ready list. If kernel threads are available (and thus the ready
list was empty), a system call is required to wake up the kernel thread so that it can run the
unblocked user thread. Thus operations on user threads induce (expensive) operations on
kernel threads if there is a surplus of kernel threads.




                                                                                                                     VI–16
Performance
                                    (still continued)

                     • Two-level model (not-so-good news)
                       – if not enough kernel threads, deadlock is
                         possible
                            - Solaris automatically creates a new kernel
                              thread if all are blocked




            CS 167                           VI–17      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Runtime decisions must be made about kernel threads. How many should there be? When
should they be created? With the single-kernel-thread version of the model, these questions
are answered trivially; coping with these questions for the general model is a very important
aspect.
  One concern is deadlock. For example, suppose a process has two chores being handled by
two user threads but just one kernel thread. The kernel thread has been assigned to one of
the user threads and is blocked. There is code that could be executed in the other user
thread that would unblock the first, but since no kernel thread is available, this code will
never be executed—both user threads (and the kernel thread) are blocked forever. (This
scenario could happen on a Unix system, for example, if the two user threads are
communicating via a pipe and the first is blocked on a read, waiting for the other to do a
write.)
  In Solaris, this problem is prevented with the aid of the operating system. If the OS detects
that all of a process’s kernel threads are blocked, it notifies the user-level threads code,
which creates a new kernel thread if there are user threads that are runnable.




                                                                                                                    VI–17
Performance
                                (yet still continued)

                    • Two-level model (more bad news)
                      – loss of parallelism if not enough kernel threads
                          - use pthread_setconcurrency in Solaris
                      – excessive overhead if too many kernel threads




           CS 167                           VI–18     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  What if there are not enough kernel threads? As discussed in the previous page, the
Solaris kernel insures that there are enough kernel threads to prevent deadlock. However,
we might have a situation in which there are two processors and two kernel threads. One
kernel thread is blocked (its user thread is waiting on I/O); the other kernel thread is
running (its user thread is in a compute loop). If there is another runnable user thread, it
won’t be able to run until a kernel thread becomes available, even though there is an
available processor. (A new one won’t automatically be created, since this is done only when
all of a process’s kernel threads are blocked.) One overcomes this problem in Solaris by
using the pthread_setconcurrency routine to set a lower bound on the number of kernel
threads used by a process.
  What if there are too many kernel threads? One result would be that at times there might
be more ready kernel threads than activities and thus the threads’ execution would be time-
sliced. Unless there are vastly too many threads, this would cause no noticeable problems.
However, there are more subtle issues. For example, suppose two user threads are each
handling a separate chore, with synchronization constructs being used to alternate the
threads’ executions to ensure that they are not executed simultaneously. If we use just a
single kernel thread, the synchronization is handled entirely in user space: first one user
thread runs, then joins a wait queue; the kernel thread runs the other user thread, which
soon releases the first user thread and joins a wait queue itself, and so forth. The kernel
thread alternates running the two user threads and execution never enters the kernel.




                                                                                                                  VI–18
Performance
                                  (notes continued)

                     • Two-level model (more bad news)
                       – loss of parallelism if not enough kernel threads
                           - use pthread_setconcurrency in Solaris
                       – excessive overhead if too many kernel threads




            CS 167                           VI–19      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Now suppose we add another kernel thread. When one user thread is released from
waiting, since a kernel thread is available, that thread is woken up (via a system call) to run
the waking user thread. When the first user thread subsequently blocks, its kernel thread
has nothing to do and must perform a system call to block in the kernel. Thus each user
thread runs on a separate kernel thread and system calls are required to repeatedly block
and release the kernel threads.
  We performed exactly this experiment on Solaris 2.6. Two user threads alternated their
execution one million times, using semaphores for synchronization. The total running time
was 24.6 seconds when one kernel thread was used, but was 68.5 seconds when two kernel
threads were used—a slowdown of almost a factor of three.




                                                                                                                    VI–19
Performance
                                      (final word)

                    • Scheduler activations model
                       – no problems with too few or too many kernel
                         threads
                           - (it doesn’t have any)




           CS 167                          VI–20     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The scheduler-activations model, since it has no kernel threads at all, clearly has no
problems resulting from having either too many or too few of them. Since threads are
represented entirely in user space, operations on them are relatively cheap. The kernel,
knowing exactly how many ready threads each process has, ensures that no activity is
needlessly idle.




                                                                                                                 VI–20

Mais conteúdo relacionado

Mais procurados

Unix operating system
Unix operating systemUnix operating system
Unix operating systemABhay Panchal
 
Transcendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-finalTranscendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-finalThe Linux Foundation
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Coresyeokm1
 
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3WE-IT TUTORIALS
 
Visual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & VirtualisationVisual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & Virtualisationwangyuanyi
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusLukáš Czerner
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systemsVandana Salve
 
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Nicolas Desachy
 
June 07 MS3
June 07 MS3June 07 MS3
June 07 MS3Samimvez
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISAPatrick Bellasi
 

Mais procurados (20)

Unix operating system
Unix operating systemUnix operating system
Unix operating system
 
Transcendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-finalTranscendent memoryupdate xensummit2010-final
Transcendent memoryupdate xensummit2010-final
 
Unix v6 Internals
Unix v6 InternalsUnix v6 Internals
Unix v6 Internals
 
Case study windows
Case study windowsCase study windows
Case study windows
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
 
Embedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile DevicesEmbedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile Devices
 
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3
tybsc it sem 5 Linux administration notes of unit 1,2,3,4,5,6 version 3
 
Oct2009
Oct2009Oct2009
Oct2009
 
Visual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & VirtualisationVisual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & Virtualisation
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current Status
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systems
 
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
Informix User Group France - 30/11/2010 - Fonctionalités IDS 11.7
 
June 07 MS3
June 07 MS3June 07 MS3
June 07 MS3
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
os
osos
os
 
2337610
23376102337610
2337610
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
linux kernel overview 2013
linux kernel overview 2013linux kernel overview 2013
linux kernel overview 2013
 
Chapter 1: Introduction to Unix / Linux Kernel
Chapter 1: Introduction to Unix / Linux KernelChapter 1: Introduction to Unix / Linux Kernel
Chapter 1: Introduction to Unix / Linux Kernel
 
Introduction to Linux
Introduction to LinuxIntroduction to Linux
Introduction to Linux
 

Semelhante a 06threadsimp

Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4tech2click
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 
Linux Performance Tunning Kernel
Linux Performance Tunning KernelLinux Performance Tunning Kernel
Linux Performance Tunning KernelShay Cohen
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows VistaTrinh Phuc Tho
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersVaibhav Sharma
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Neeraj Shrimali
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07timcrack
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernelMahendra M
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 

Semelhante a 06threadsimp (20)

02unixintro
02unixintro02unixintro
02unixintro
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
Chapter 6 os
Chapter 6 osChapter 6 os
Chapter 6 os
 
Linux Performance Tunning Kernel
Linux Performance Tunning KernelLinux Performance Tunning Kernel
Linux Performance Tunning Kernel
 
Os
OsOs
Os
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows Vista
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Bglrsession4
Bglrsession4Bglrsession4
Bglrsession4
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Thread
ThreadThread
Thread
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
 
4.Process.ppt
4.Process.ppt4.Process.ppt
4.Process.ppt
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
 
Operating system.pptx
Operating system.pptxOperating system.pptx
Operating system.pptx
 
Walking around linux kernel
Walking around linux kernelWalking around linux kernel
Walking around linux kernel
 
Topic 4- processes.pptx
Topic 4- processes.pptxTopic 4- processes.pptx
Topic 4- processes.pptx
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

06threadsimp

  • 1. Threads Implementations CS 167 VI–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved. VI–1
  • 2. Outline • Threads implementations – the one-level model – the variable-weight processes model – the two-level model (one kernel thread) – the two-level model (multiple kernel threads) – the scheduler-activations model – performance CS 167 VI–2 Copyright © 2006 Thomas W. Doeppner. All rights reserved. VI–2
  • 3. Implementing Threads • Components – chores: the work that is to be done – processors: the active agents – threads: the execution context CS 167 VI–3 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Before we discuss how threads are implemented, let’s introduce some terms that will be helpful in discussing implementations. It’s often convenient to separate the notion of the work that is being done (e.g., the computation that is to be performed) from the notion of some active agent who is doing the work. So we call the former a chore, and the latter a processor. Examples of the former include computing the value of π, looking up an entry in a database, and redrawing a window. To model systems in sufficient detail to discuss performance, we use a further abstraction—contexts. When a processor executes code, its registers contain certain values, some of which define such things as the stack. These values must be loaded into a processor so that the processor can execute the code associated with handling a chore; conversely, this information must be saved if the processor is to be switched to executing the code associated with some other chore. We define a thread (or thread of control) to be this information; in other words, it is the execution context. VI–3
  • 4. Scheduling • Chores on threads – event loops • Threads on processors – time-division multiplexing - explicit thread switching - time slicing CS 167 VI–4 Copyright © 2006 Thomas W. Doeppner. All rights reserved. An important aspect of multithreading is scheduling threads on processors. One might think that the most important aspect is the schedule: when is a particular thread chosen for execution by a processor? This is certainly not unimportant, but what is perhaps more crucial is where the scheduling takes place and what sort of context is being scheduled. The simplest system would be to handle a single chore in the context of a single thread being executed by a single processor. This would be a rather limited system: it would perform that chore and nothing else. A simple multiprocessor system would consist of multiple chores each handled in the context of a separate thread, each being executed by a separate processor. More realistic systems handle many chores, both sequentially and concurrently. This requires some sort of multiplexing mechanism. One might handle multiple chores with a single thread: this is the approach used in event-handling systems, in which a thread is used to support an event loop. In response to an event, the thread is assigned an associated chore (and the processor is directed to handle that chore). Time-division multiplexing allows chores to be handled concurrently and can be done by dividing up a processor’s time among a number of threads. The mechanism for doing this might be time-slicing: assigning the processor to a thread for a certain amount of time before assigning it to another thread, or explicit: code in the chore releases the processor so that it may switch to another thread. In either case, a scheduler is employed to determine which threads should be assigned processors. VI–4
  • 5. Multiplexing Processors Blocked Runnable Keyboard Running Runnable Blocked Running Disk Runnable CS 167 VI–5 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To be a bit more precise about scheduling, let’s define some more (standard) terms. Threads are in either a blocked state or a runnable state: in the former they cannot be assigned a processor, in the latter they can. A scheduler determines which runnable threads should be assigned processors. Runnable threads that have been assigned activities are called running threads. VI–5
  • 6. One-Level Model User Kernel Processors CS 167 VI–6 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In most systems there are actually two components of the execution context: the user context and the kernel context. The former is for use when an activity is executing user code; the latter is for use when the activity is executing kernel code (on behalf of the chore). How these contexts are manipulated is one of the more crucial aspects of a threads implementation. The conceptually simplest approach is what is known as the one-level model: each thread consists of both contexts. Thus a thread is scheduled to an activity and the activity can switch back and forth between the two types of contexts. A single scheduler in the kernel can handle all the multiplexing duties. The threading implementation in Windows is (mostly) done this way. VI–6
  • 7. Variable-Weight Processes • Variant of one-level model • Portions of parent process selectively copied into or shared with child process • Children created using clone system call CS 167 VI–7 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Unlike most other Unix systems, which make a distinction between processes and threads, allowing multithreaded processes, Linux maintains the one-thread-per-process approach. However, so that we can have multiple threads sharing an address space, Linux supports the clone system call, a variant of fork, via which a new process can be created that shares resources (in particular, its address space) with the parent. The result is a variant of the one- level model. This approach is not unique to Linux. It’s used in SGI’s IRIX and was first discussed in early ’89, when it was known as variable-weight processes. (See “Variable-Weight Processes with Flexible Shared Resources,” by Z. Aral, J. Bloom, T. Doeppner, I. Gertner, A. Langerman, G. Schaffer, Proceedings of Winter 1989 USENIX Association Meeting.) VI–7
  • 8. Cloning Signal Info Parent Child Files: file-descriptor table FS: root, cwd, umask Virtual Memory CS 167 VI–8 Copyright © 2006 Thomas W. Doeppner. All rights reserved. As implemented in Linux, a process may be created with the clone system call (in addition to using the fork system call). One can specify, for each of the resources shown in the slide, whether a copy is made for the child or the child shares the resource with the parent. Only two cases are generally used: everything is copied (equivalent to fork) or everything is shared (creating what we ordinarily call a thread, though the “thread” has a separate process ID). VI–8
  • 9. Linux Threads (pre 2.6) Initial Thread Manager Thread Other Pipe Thread Other Thread Other Thread CS 167 VI–9 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Building a POSIX-threads implementation on top of Linux’s variable-weight processes requires some work. What’s discussed here is the approach used prior to Linux 2.6. Some information about the threads implementation of 2.6 can be found at http://people.redhat.com/drepper/nptl-design.pdf. Each thread is, of course, a process; all threads of the same computation share the same address space, open files, and signal handlers. One might expect that the implementation of pthread_create would be a simple call to clone. This, unfortunately, wouldn’t allow an easy implementation of operations such as pthread_join: a Unix process may wait only for its children to terminate; a POSIX thread can join with any other joinable thread. Furthermore, if a Unix process terminates, its children are inherited by the init process (process number 1). So that pthread_join can be implemented without undue complexity, a special manager thread (actually a process) is the parent/creator of all threads other than the initial thread. This manager thread handles thread (process) termination via the wait4 system call and thus provides a means for implementing pthread_join. So, when any thread invokes pthread_create or pthread_join, it sends a request to the manager via a pipe and waits for a response. The manager handles the request and wakes up the caller when appropriate. The state of a mutex is represented by a bit. If there are no competitors for locking a mutex, a thread simply sets the bit with a compare-and-swap instruction (allowing atomic testing and setting of the mutex’s state bit). If a thread must wait for a mutex to be unlocked, it blocks using a sigsuspend system call, after queuing itself to a queue headed by the mutex. A thread unlocking a mutex wakes up the first waiting thread by sending it a Unix signal (via the kill system call). The wait queue for condition variables is implemented in a similar fashion. On multiprocessors, for mutexes that are neither recursive nor error-checking, waiting is implemented with an adaptive strategy: under the assumption that mutexes are typically not held for a long period of time, a thread attempting to lock a locked mutex “spins” on it for up to a short period of time, i.e., it repeatedly tests the state of the mutex in hopes that it will be unlocked. If the mutex does not become available after the maximum number of tests, then the thread finally blocks by queuing itself and calling sigsuspend. VI–9
  • 10. Two-Level Model One Kernel Thread User Kernel Processors CS 167 VI–10 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Another approach, the two-level model, is to represent the two contexts as separate types of threads: user threads and kernel threads. Kernel threads become “virtual activities” upon which user threads are scheduled. Thus two schedulers are used: kernel threads are multiplexed on activities by a kernel scheduler; user threads are multiplexed on kernel threads by a user-level scheduler. An extreme case of this model is to use only a single kernel thread per process (perhaps because this is all the operating system supports). The Unix implementation of the Netscape web browser was based on this model (recent Solaris versions use the native Solaris implementation of threads), as were early Unix threads implementations. There are two obvious disadvantages of this approach, both resulting from the restriction of a single kernel thread per process: only one activity can be used at a time (thus a single process cannot take advantage of a multiprocessor) and if the kernel thread is blocked (e.g., as part of an I/O operation), no user thread can run. VI–10
  • 11. Two-Level Model: Multiple Kernel Threads User Kernel Processors CS 167 VI–11 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A more elaborate use of the two-level model is to allow multiple kernel threads per process. This deals with both the disadvantages described above and is the basis of the Solaris implementation of threading. It has some performance issues; in addition the notion of multiplexing user threads onto kernel threads is very different from the notion of multiplexing threads onto activities—there is no direct control over when a chore is actually run by an activity. From an application’s perspective, it is sometimes desired to have direct control over which chores are currently being run. VI–11
  • 12. Scheduler Activations User Kernel CS 167 VI–12 Copyright © 2006 Thomas W. Doeppner. All rights reserved. A third approach, known historically as the scheduler activations model, is that threads represent user contexts, with kernel contexts supplied when needed (i.e., not as a kernel thread, as in the two-level model). User threads are multiplexed on activities by a user-level scheduler, which communicates to the kernel the number of activities needed (i.e., the number of ready user threads). The kernel multiplexes entire processes on activities—it determines how many activities to give each process. This model, which is the basis for the Digital-Unix (now True64 Unix) threading package, certainly gives direct control to the user application over which chores are being run. To make some sense of this, let’s work through an example. A process starts up, containing a single user execution context (and user thread) and a kernel execution context (and kernel thread). Following the dictates of its scheduling policy, the kernel scheduler assigns a processor to the process. If the kernel thread blocks, the process implicitly relinquishes the processor to the kernel scheduler, and gets it back once it unblocks. Suppose that the user program creates a new thread (and its associated user execution context). If actual parallelism is desired, code in the user-level library notifies the kernel that two processors are desired. When a processor becomes available, the kernel creates a new kernel execution context; using the newly available processor running in the new kernel execution context, it places an upcall (going from system code to user code, unlike a system call, which goes from user code to system code) to the user-level library, effectively giving it the processor. The library code then assigns this processor to the new thread and user execution context. VI–12
  • 13. Scheduler Activations (continued) User Kernel CS 167 VI–13 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The user application might then create another thread. It might also ask for another processor, but this machine only has two. However, let’s say that one of its other threads (thread 1) blocks on a page fault. The kernel, getting the processor back, creates a new kernel execution context and places another upcall to our process, telling it two things: • The thread using kernel execution context 1 has blocked, and thus it has lost its processor (processor 1). • Here is processor 1, can you use it? In our case the process will assign the processor to thread 3. But soon the page being waited for by thread 1 becomes available. The kernel should notify the process of this event, but, of course, it requires a processor to do this. So it uses one of the processors already assigned to the process, the one the process has assigned to thread 2. The process is now notified of the following two events: • The thread using kernel execution context 1 has unblocked (i.e., it would be running, if only it had a processor). • I’m telling you this using processor 2, which I’ve taken from the thread that was using kernel execution context 2. The library now must decide what to do with the processor that has been handed to it. It could give it back to thread 2, leaving thread 1 unblocked, but not running, in the kernel, it could continue the suspension of thread 2 and give the processor to thread 1, or it could decide that both threads 1 and 2 should be running now and thus suspend thread 3, give its processor to thread 1, and give thread 2 its processor back. VI–13
  • 14. Scheduler Activations (still continued) User Kernel CS 167 VI–14 Copyright © 2006 Thomas W. Doeppner. All rights reserved. At some point the kernel is going to decide that the process has had one or both processors long enough (e.g., a time slice has expired). So it yanks one of the processors away and, using the other processor, makes an upcall conveying the following news: • I’ve taken processor 1. • I’m telling you this using processor 2. The library learns that it now has only one processor, but with this knowledge it can assign the processor to the most deserving thread. VI–14
  • 15. Performance • One-level model – operations on threads are expensive (require system calls) – example: mutual exclusion in Windows - critical section implemented partly in user mode • success case in user code - mutex implemented completely in kernel - success case is 20 times faster for critical section than for mutex CS 167 VI–15 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The one-level model is the most straightforward—unlike the others, there is but a single scheduler. This scheduler resides in the kernel; hence the bulk of the data structures and code required to represent and manipulate threads is in the kernel (though not all, as we discuss below). Thus many thread operations, such as synchronization, thread creation and destruction, involve calls to kernel code—system calls. Since in most architectures such calls (from user code) are significantly more expensive than calls to user code, threading implementations based on the one-level model are prone to high operation costs. To illustrate the performance penalty incurred when performing a system call, we measured the costs of performing an operation both in user space and in the kernel in Windows NT 4.0. The operation, waiting for an object to become unlocked and then locking it, is performed frequently by numerous applications and has a highly optimized implementation, especially for the case we exercised, in which the object is not previously locked. NT provides two constructs for doing this—the critical section and the mutex. The former is implemented partly in user code and partly in kernel code. If the object in question (represented by the critical section) is not locked, the critical section operates strictly in user mode. The mutex is implemented entirely in kernel code: regardless of its state, operations on it involve system calls. Our measurements show that requests to lock, then unlock a mutex take twenty times longer than to lock and unlock a critical section when the mutex and critical section are not already locked. (Operations on the two take the same amount of time when the mutex and critical section are locked by another thread.) VI–15
  • 16. Performance (continued) • Two-level model (good news) – many operations on threads are done strictly in user code: no system calls CS 167 VI–16 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The two-level model makes it possible to eliminate some of the overhead of the one-level model. Since user threads are multiplexed on kernel threads, the thread that is directly manipulated by application code is the user thread, implemented entirely in user-level code. As long as operations on user threads do not involve operations on kernel threads, all execution takes place at user level and thus the cost of calling kernel code is avoided. The trick, of course, is to avoid operations on kernel threads. The user-level library maintains a ready list of runnable user threads. When a running user thread must block for synchronization, it is put on a wait queue and its kernel thread switches to run the user thread at the head of the ready list. If the list is empty, the thread executes a system call to cause it to block. When one user thread unblocks another, the latter is moved to the end of the ready list. If kernel threads are available (and thus the ready list was empty), a system call is required to wake up the kernel thread so that it can run the unblocked user thread. Thus operations on user threads induce (expensive) operations on kernel threads if there is a surplus of kernel threads. VI–16
  • 17. Performance (still continued) • Two-level model (not-so-good news) – if not enough kernel threads, deadlock is possible - Solaris automatically creates a new kernel thread if all are blocked CS 167 VI–17 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Runtime decisions must be made about kernel threads. How many should there be? When should they be created? With the single-kernel-thread version of the model, these questions are answered trivially; coping with these questions for the general model is a very important aspect. One concern is deadlock. For example, suppose a process has two chores being handled by two user threads but just one kernel thread. The kernel thread has been assigned to one of the user threads and is blocked. There is code that could be executed in the other user thread that would unblock the first, but since no kernel thread is available, this code will never be executed—both user threads (and the kernel thread) are blocked forever. (This scenario could happen on a Unix system, for example, if the two user threads are communicating via a pipe and the first is blocked on a read, waiting for the other to do a write.) In Solaris, this problem is prevented with the aid of the operating system. If the OS detects that all of a process’s kernel threads are blocked, it notifies the user-level threads code, which creates a new kernel thread if there are user threads that are runnable. VI–17
  • 18. Performance (yet still continued) • Two-level model (more bad news) – loss of parallelism if not enough kernel threads - use pthread_setconcurrency in Solaris – excessive overhead if too many kernel threads CS 167 VI–18 Copyright © 2006 Thomas W. Doeppner. All rights reserved. What if there are not enough kernel threads? As discussed in the previous page, the Solaris kernel insures that there are enough kernel threads to prevent deadlock. However, we might have a situation in which there are two processors and two kernel threads. One kernel thread is blocked (its user thread is waiting on I/O); the other kernel thread is running (its user thread is in a compute loop). If there is another runnable user thread, it won’t be able to run until a kernel thread becomes available, even though there is an available processor. (A new one won’t automatically be created, since this is done only when all of a process’s kernel threads are blocked.) One overcomes this problem in Solaris by using the pthread_setconcurrency routine to set a lower bound on the number of kernel threads used by a process. What if there are too many kernel threads? One result would be that at times there might be more ready kernel threads than activities and thus the threads’ execution would be time- sliced. Unless there are vastly too many threads, this would cause no noticeable problems. However, there are more subtle issues. For example, suppose two user threads are each handling a separate chore, with synchronization constructs being used to alternate the threads’ executions to ensure that they are not executed simultaneously. If we use just a single kernel thread, the synchronization is handled entirely in user space: first one user thread runs, then joins a wait queue; the kernel thread runs the other user thread, which soon releases the first user thread and joins a wait queue itself, and so forth. The kernel thread alternates running the two user threads and execution never enters the kernel. VI–18
  • 19. Performance (notes continued) • Two-level model (more bad news) – loss of parallelism if not enough kernel threads - use pthread_setconcurrency in Solaris – excessive overhead if too many kernel threads CS 167 VI–19 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Now suppose we add another kernel thread. When one user thread is released from waiting, since a kernel thread is available, that thread is woken up (via a system call) to run the waking user thread. When the first user thread subsequently blocks, its kernel thread has nothing to do and must perform a system call to block in the kernel. Thus each user thread runs on a separate kernel thread and system calls are required to repeatedly block and release the kernel threads. We performed exactly this experiment on Solaris 2.6. Two user threads alternated their execution one million times, using semaphores for synchronization. The total running time was 24.6 seconds when one kernel thread was used, but was 68.5 seconds when two kernel threads were used—a slowdown of almost a factor of three. VI–19
  • 20. Performance (final word) • Scheduler activations model – no problems with too few or too many kernel threads - (it doesn’t have any) CS 167 VI–20 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The scheduler-activations model, since it has no kernel threads at all, clearly has no problems resulting from having either too many or too few of them. Since threads are represented entirely in user space, operations on them are relatively cheap. The kernel, knowing exactly how many ready threads each process has, ensures that no activity is needlessly idle. VI–20