2. Slide 2
• Understanding of Linux Device Drivers
• Basic understanding of Linux Synchronization mechanisms like
Semaphore, Mutex and Spin Locks
PrerequisitesPrerequisites
4. Slide 4
Driver Parallelism
• Parallelism or Concurrency arises when system tries to do more than one
thing at once
– Concurrency is when two tasks can start, run, and complete in
overlapping time periods. It doesn't necessarily mean they'll ever
both be running at the same instant.
– Parallelism is when tasks literally run at the same time
• The goal of parallelism/concurrency is to improve system performance
• The side affect is that it can also lead to Race conditions
• Further discussion in the slides will highlight the sources of
parallelism/concurrency, howto improve performance and avoid race
conditions for Linux Device Drivers
http://www.fasterj.com/cartoon/cartoon106.shtml
5. Slide 5
Kernel Preemption
• CONFIG_PREEMPT
– This kernel config option reduces the latency of the kernel by making all kernel
code (that is not executing in a critical section) preemptible.
– This allows reaction to interactive events by permitting a low priority process to
be preempted involuntarily even if it is in kernel mode executing
– After execution of an asynchronous event like interrupt handler, if a higher
priority process is ready to run the current process is replaced.
– Useful for embedded system with latency requirements in the milliseconds
range.
6. Slide 6
SMP Architecture
• Evolution of multiprocessor architectures
– Late 60s saw need for more CPU processing power for scientific and
compute intensive applications.
– Two or more CPUs combined to form a single computer
• SMP (Symmetric Multiprocessing) is one of the multiprocessor
architecture.
• AMP, Cluster are others
• Basic idea, more tasks in parallel per unit time
7. Slide 7
SMP Architecture
Cache Cache Cache Cache
CPU CPU CPU CPU
I/O
Memory
Fig 1 : Logical view of SMP
In actual hardware implementation, cache will not be
directly connected to bus.
Cache Cache Cache Cache
CPU CPU CPU CPU
I/O
Memory
Fig 1 : Logical view of SMP
In actual hardware implementation, cache will not be
directly connected to bus.
8. Slide 8
SMP Architecture Contd
• 4 CPU SMP system shown in diagram, all CPUs would be symmetric i.e.
would be of same architecture, frequency etc
• CPU, Memory, IO tightly coupled using high speed interconnect bus,
allowing any unit connected to bus to communicate with any other unit
• Single globally accessible memory used by all CPUs, No local RAM in
CPUs, Data changes visible to all CPUs
• Symmetric or equal access to global shared memory, contents are fully
shared, all CPUs use the same address whenever referring to the same
piece of data
• I/O access also symmetric, i.e. any cpu can initiate I/O
9. Slide 9
SMP Architecture Cont
• Interrupts distributed across CPUs by PIC
• Access to bus and memory has to be arbitrated so that no 2 CPUs step
on each other, and all have guaranteed fair access
• Max CPUs that can be used depends on Bus bandwidth
• Only one instance of OS or Operating System, which is loaded in main
memory
• Concurrent access to kernel data structures, hence kernel needs to be
SMP aware
11. Slide 11
SMP Intricacies: Cache Coherency
• CPU stores data into cache in most implementations to improve system
performance.
• Consider the case of 2 Threads running on 2 different CPUs in a SMP
system. Both use global variable “Data”. If one of them modifies it to 1, it
is reflected in its own cache only. Values in main memory and other cpu’s
cache are stale, and if those values are read by other CPU, results could
be unpredictable. Hence the need to maintain consistency or coherency
of caches.
• This problem is typically solved by Hardware cache consistency protocols,
which include snooping and write-update/write-invalidate
12. Slide 12
SMP Intricacies: Atomic
operations
• Two threads trying to obtain the same semaphore simultaneously. Both
read value of 0 think its available and set it to 1.
• These issues are solved by using atomic instructions provided by each
architecture
• Special instructions provide Atomic test and set operations. Example
load-linked and store-conditional instructions in MIPS and load-exclusive
store-exclusive in ARM
13. Slide 13
USB Subsystem Analysis
USB Host
Controller
EHCI Driver
USB Core
USB Print
Class Driver
USB Mass Storage
Class Driver
USB Print
APP
USB Mass
Storage APP
Linux
Host
USB Device
Controller
UDC Driver
Mass storage
gadget Driver
Print gadget
Driver
USB
Print App
Linux
Device
Simplified view of USB Subsystem
14. Slide 14
USB Subsystem Analysis:
No preempt
• Assume Linux host has initiated a large transfer for USB mass storage.
• In-kernel transfer would not be pre-empted until available data is
exhausted.
• High priority, small amount of data for Print would get scheduled only after
mass storage transfer is complete.
• This affects end user experience
15. Slide 15
USB Subsystem Analysis:
Preempt Enabled
• Assume the same scenario with kernel preemption enabled.
• In kernel transfer of mass-storage can be preempted and replaced by
Print data transfer, for example after processing a keyboard or timer
interrupt
• Opens another parallel path into both USB core and Ehci drivers, since
mass storage transfer is not complete and Print transfer has started.
• Print transfer could re-open the same device, access the same data
structures for initiating transfer, and could even disconnect the device.
16. Slide 16
USB Subsystem Analysis:
Preempt Enabled
• Hence driver design needs to determine all parallel paths and points at
which its safe to be pre-empted, at the same time enable parallelism.
• For example it could be safe to pre-empt once URB request is queued,
but might not be safe to pre-empt when DMA is in progress since DMA
configuration registers could be overwritten.
17. Slide 17
USB Subsystem Analysis: SMP
• Assume the previous scenario on a SMP system
• In this case the scheduler need not pre-empt the running mass storage transfer,
but can schedule the print transfer on an another CPU.
• This too opens a new parallel path into the drivers, and both would be executing
at the same instant of time.
• Hence if parallelism is taken care in the drivers, its to a large extent SMP safe.
• In SMP systems Interrupt handler and driver code could run concurrently on
different CPUs.
• Hence the need to protect Interrupt handlers using spin locks
18. Slide 18
Driver Scenarios
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
local_irq_disable();
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
local_irq_enable();
}
irqreturn_t ts_isr (int irq, void *dev_id)
{
/* Process Interrupt */
list_add_tail(node, &ts_list);
}
local_irq_disable () protects from both interrupt handler and
preemption
spin_lock_irqsave () needs to be added for SMP safe in Driver
Code & ISR
19. Slide 19
Driver Scenarios: Cont
Locking using Mutex/Semaphore doesn't disable pre-emption,
but guarantees that data structure is not corrupted on pre-
emption
Both SMP safe and Pre-empt Safe
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
int process_rest_entries()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process remaining elements */
}
mutex_unlock(ts->lock);
}
20. Slide 20
Driver Scenarios: Cont
Functions process_ts_entries() and
process_rest_entries() could deadlock if pre-empted
while holding one of the locks
Locks need to be obtained in the same order, to avoid
deadlock
static LIST_HEAD(ts_list);
static LIST_HEAD(tc_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
/* Some processing */
mutex_lock_interruptible(tc->lock);
}
int process_rest_entries()
{
mutex_lock_interruptible(tc->lock);
/* Some processing */
mutex_lock_interruptible(ts->lock);
}
21. Slide 21
Driver Scenarios: Cont
In some cases it might be better to access resources from a single
function, rather than have locks spread across through out the code
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
{
/* Process list elements */
process_ts_entries();
}
{
/* Process list elements */
process_ts_entries();
}
22. Slide 22
Driver Scenarios
• Don’t use one big lock for everything, reduces concurrency
• Too fine-grained locks increases overhead
• Need to balance both aspects
• Reader –Writer locks
– If Data structures are read more often than being updated
– Allows multiple reads locks to be obtained simultaneously.
– Allows single write lock to be obtained, and also prevents any read lock from
being obtained while write lock is held
– Available for both spin locks and semaphores
• Stack variables/structures don't need locking, since on pre-emption
another instance is created
23. Slide 23
Summary
• Concurrency/Parallelism needs to be one of the criteria during Driver Design
phase
• Analysis required to determine the parallel paths and protection for critical
sections
• Drivers which ensure concurrency using appropriate locking techniques, not only
avoids race conditions but also improves performance
• Unit testing could be used to test some of the parallel paths in the driver
– Two different applications which will enable parallel path into the same driver.
– Two instances for the same application.