Cilk is a C-based runtime system for multi-threaded parallel programming.
Cilk guarantees efficient and predictable performance, Lightweight fork and join.
1. Cilk: An Efficient Multithreaded
Runtime System
Mohanadarshan - 148241N
Shareek Ahamed - 148201T
Authors: Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul,
Charles E. Leiserson, Keith H. Randall and Yuli Zhou
MIT Laboratory for Computer Science, Cambridge
2. Agenda
● What is Cilk ?
● Why Cilk ?
● Introduction
● Scheduling & Work Stealing
● How it Works ?
● Fibonacci Calculation
● Performance in Cilk Applications
● Current Usage
● Related Works
● Cilk Plus
● Conclusion
3. What is Cilk ?
● Cilk is a C-based runtime system for multi-threaded parallel programming.
● Cilk guarantees efficient and predictable performance
● Lightweight fork and join
○ Own scheduler (Work Stealing Scheduler)
● Proofs for Performance and Space
● World Class chess programs like StarTech, *Socrates, and Cilkchess are
developed by Cilk.
4. Why Cilk ?
Multithreading requires to implement dynamic, asynchronous, concurrent programs.
● A multithreaded system provides the programmer with a means to create,
synchronize, and schedule threads.
● Cilk reduces the complexity of implementing multithreaded programs.
● Programmer don’t have to worry about the complexity, only need to identify
region for parallelism.
● Cilk optimizes:
➔ Total work
➔ Critical path
6. Introduction (contd..)
● Cilk program is a set of procedures
● A procedure is a sequence of threads
● Cilk threads are:
○ Represented by nodes in the dag
○ Non-blocking: run to completion: no waiting or suspension: atomic units
of execution
● Threads can spawn child threads
○ downward edges connect a parent to its children
7. Introduction (contd..)
● A child & parent can run concurrently.
○ Non-blocking threads --> a child cannot return a value to its parent.
○ The parent spawns a successor that receives values from its children
● A thread & its successor are parts of the same Cilk procedure.
○ connected by horizontal arcs
● Children’s returned values are received before their successor begins:
○ They constitute data dependencies.
○ Connected by curved arcs
8. How it Works ?
● spawn T (k, ?x)
- spawn a child thread
● spawn_next T(k, ?x)
- A successor thread is spawned the same way as a child, except the keyword spawn_next is used
● send_argument( k, value )
- sends value to the argument slot of a waiting closure specified by continuation k.
spawn_next
send_argumentspawn
Parent
Child
Successor
9. Scheduling
Every Processor has own
- Scheduler
- Ready-Queue
Invoked when thread ends
- Schedules or steals another thread
10. Work Stealing
● Cilk uses run time scheduling called work stealing.
● Works well on dynamic, asynchronous, MIMD-style programs.
● Work-stealing:
○ a process with no work selects a victim from which to get work.
○ it gets the shallowest thread in the victim’s spawn tree.
● In Cilk, thieves choose the victims randomly.
18. How it Works ? (Example :Fibonacci)
thread int fib ( cont int k, int n ) {
if ( n < 2 ) send_argument( k, n );
else { cont int x, y;
spawn_next sum ( k, ?x, ?y );
spawn fib ( x, n - 1 );
spawn fib ( y, n - 2 );
}
}
thread sum ( cont int k, int x, int y ) {
send_argument ( k, x + y );
}
20. Ready Queue
if ( ! readyDeque .isEmpty() )
take deepest thread
else
steal shallowest thread from
readyDeque of randomly selected
victim
21. Performance in Cilk Application
Experiments were ran on a CM5 supercomputer to document the efficiency of
the work-stealing scheduler.
Tested Applications
1. fib (fibonacci)
2. queens (placing N queens on a N x N chessboard)
3. pfold (protein-folding)
4. ray (ray-tracing algorithm for graphics rendering)
5. Knary (at each node runs an empty “for” loop )
6. Socrates (parallel chess program, uses the Jamboree search algorithm)
22. Performance in Cilk Application (contd..)
Tserial
⇒ Time taken to run C program (gcc)
T1
⇒ Time taken to run 1-processor Cilk program
T ∞
⇒ Cilk computation timestamping each thread
Tp
⇒ Processor execution time of the Cilk program
Tserial
⇒ Efficiency of the Cilk program
T1
⇒ Efficiency is close to 1 for programs with moderately long threads
Cilk overhead is small.
26. Related Works
EARTH (An Efficient Architecture for Running THreads)
EARTH supports an adaptive event Driven multithreaded execution model,
containing two thread levels:
● threaded procedures
● fibers
A threaded procedure is invoked asynchronously forking a parallel thread of
execution.
A threaded procedure is statically divided into fibers fine grain threads
communicating through dataflow-like synchronization operations.
27. EARTH vs. CILK
EARTH Model CILK Model
Note: - EARTH has it origin in static dataflow model
- In comparison features of CILK Model is similar to the EARTH model
29. Cilk Plus
cilk int fib (int n)
{
if (n < 2) return n;
else
{
int x, y;
x = spawn fib (n-1);
y = spawn fib (n-2);
sync;
return (x+y);
}
}
- Easier to implement than Cilk!
- Less complex than Cilk!
30. Conclusion
● Pros
➔ Guaranteed runtime & space usage
➔ Good performance
➔ Critical Path is short compared to total work
➔ Low Overhead
➔ Very Simple to Use
● Cons
➔ Only suitable for tree like computations
➔ Continuations are confusing
➔ No shared memory