Unlocking the Future of AI Agents with Large Language Models
Â
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
1. Outline
âą spin_lock and semaphore in linux kernel
â Introduction and difference.
â Dead lock example of spin_lock.
âą What is Context
â What is âcontextâ.
â Control flow of procedure call, and interrupt handler.
âą Log analysis
âą Conclusion
â How to prevent dead lock of spin_lock.
0
2. Spin lock & Semaphore
âą Semaphore:
â When init value is 1, it can be a mutex lock to prevent compromise of
critical section, just like spin lock.
â Different from spin lock, thread goes sleep for waiting lock when failed
to get the lock.
âą Spin lock:
â Thread doesnât go sleep for waiting lock when failed to get the lock, it
continue loop of trying to get lock.
1
3. Spin lock
âą Spin lock usage for mutex lock :
2
Critical
Section
code
Spin_unlock(&mutex_lock)
Critical
Section
code
Spin_lock(&mutex_lock)
Spin_unlock(&mutex_lock)
1
Thread A start
execution.
Kernel code :
Thread âs time slice
is decreased to
zero. Threadâs
context will be
saved, then
processor is
assigned to
another thread
2
Timer interrupt
preempt thread A
Spin_lock(&mutex_lock)
3
Thread B failed to get
lock , and continue loop
for trying getting lock
forever
Kernel code :
Thread âs time slice
is decreased to
zero. Threadâs
context will be
saved, then
processor is
assigned to
another thread
4
Timer interrupt
preempt thread B
5 Thread A finish
critical section.
Thread A Thread B
4. What is context
âą What does âcontextâ means?
â A set of dedicated hardware resource that program will
use to meet the need of successful execution.
âą Such as :
â general purpose register for computing.
â stack memory for support of procedure call.
â But from kernelâs point of view, âdedicated context of
processâ actually is simulated, in fact resources are limited.
âą kernel slices time and do context saving & restoring in purpose of
emulating a multi-processor environment.
âą Program (process) will think just like that it have a dedicated context.
3
5. What is context
âą What is user context and interrupt context
â user context: provided by kernel context-switch facility which is triggered by
timer interrupt, owner is call a user process, runs in user space code with user
mode or in kernel space code with svc mode.
â Interrupt context: part of registers (context?) save and restore by interrupt
handler by itself.
âą Actually part of interrupt context(reg) will be the some context(reg) of
some user process.
4
Processor time
axis
Save every register which will be used later
into stack.
âŠ
âŠ
Restore those register which have been used.
And jump to return address (r14 register)
Pci bus interrupt
Timer interrupt
Timer interrupt
Thread A
Thread A
Thread B
Thread B
Aâs subroutine
Int_handler()
6. What is context
âą Compare Interrupt handler & procedure call.
â Interrupt handler run as a procedure call.
â The difference is that
âą int_handler donât receive any parameter and donât return any value.
âą Program is even unaware of execution of int_handler.
5
Processor time
axis
Pci bus interrupt
Timer interrupt
Timer interrupt
Thread A
Thread A
Thread B
Thread B
subroutine
Save every register which will be used later
into stack.
âŠ
âŠ
Restore those register which have been used,
and jump to return address(r14).
Save every register which will be used later
into stack.
Read parameter in param register
âŠ
Put return value in param register
Restore those register which have been used,
and jump to return address(r14).
Void Foo(void) : user space
Int_handler(): kernel space
7. double-acquire deadlock(1/2)
âą Spin_lock convention
â Unlike spin lock implementation in other
operating system, linux kernelâs spin lock is not
recursive.
â Double-acquire deadlock example as followed:
6
Spin_lock(&mutex_lock);
fooB();
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later into stack.
Read parameter in param register
âŠ
Spin_lock(&mutex_lock);
âŠ
Put return value in param register
Restore those register which have been used,
and jump to return address(r14).
Void fooB(void)
8. double-acquire deadlock(2/2)âą Spin_lock synchronization between user context and interrupt context
â Double-acquire deadlock example(2) as followed:
â Example that wonât have Double-acquire deadlock as followed:
7
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later into stack.
âŠ
Spin_lock(&mutex_lock);
âŠ
Restore those register which have been used,
and jump to return address(r14).
Sdio_int_handler()
Interrupt
happens just
after thread A
get spin lock
Sdio_int handler
will be busy-
waiting
mutex_lock
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread A
Save every register which will be used later
into stack.
âŠ
Spin_lock(&mutex_lock);
âŠ
Restore those register which have been used,
and jump to return address(r14).
Sdio_int_handler()
Timer Interrupt
happens just
after thread A
get spin lock
Kernel code :
Thread âs time slice is
decreased to zero.
Threadâs context will be
saved, then processor is
assigned to another thread
Thread Bâs user code
execution
Sdio Interrupt
happens just
after thread A
get spin lock
Sdio_int handler
and thread B will
be busy-waiting
mutex_lock
9. Log Analysis(1)
âą In our case, CheckCallbackTimeout() might just
interrupt WiMAXQueryImformation() in user
context(CM_Query thread)
8
Spin_lock(&mutex_lock);
Spin_unlock(&mutex_lock);
Thread A
Timer Interrupt
happens just
after thread A
get spin lock
Kernel code :
âŠ
If (timer has to be exucuted){
CheckCallbackTimeout();
}
âŠ
âŠ
Return;
CheckCallbackTimeout
{
LDDB_spin_lock();
âŠ
}
10. Log Analysis(2)
âą Timer callback function is called in __irq_svc.
âą __irq_svc is a subroutine which is only called by irq
handler.
9
11. Conclusion â Immediate Solution
âą Use spin_lock_irqsave and
spin_lock_irqrestore.
â Turn off interrupt before acquire spin lock.
10
12. Conclusion â what action we have to take right
now
âą What should we do before implementation - Identify those
context which open the same lock to do synchronization.
â Prevent double-acquire deadlock scenario with interrupt disable API,
when lock is shared in interrupt and user context.
â Prevent using semaphore in interrupt context.
â Leave interrupt as soon as possible, and postpone task into other user
context, such as work queue.
âą Turn on CONFIG_PROVE_LOCKING,
CONFIG_DEBUG_LOCK_ALLOC, CONFIG_DEBUG_SPINLOCK
â That will help debugging.
11