SlideShare uma empresa Scribd logo
1 de 59
Multithreaded Programming in Cilk L ECTURE  1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Cilk A C language for programming dynamic multithreaded applications on shared-memory multiprocessors. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example applications:
Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
Cilk Is Simple ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fibonacci Cilk is a  faithful   extension of C.  A Cilk program’s  serial elision  is always a legal implementation of Cilk semantics.  Cilk provides  no   new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Cilk code
Basic Cilk Keywords cilk  int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync; return (x+y); } } Identifies a function as a  Cilk procedure , capable of being spawned in parallel. The named  child  Cilk procedure can execute in parallel with the  parent  caller. Control cannot pass this point until all spawned children have returned.
Dynamic Multithreading cilk   int fib (int n) { if (n<2) return (n); else { int x,y; x =  spawn  fib(n-1); y =  spawn  fib(n-2); sync ; return (x+y); } } The  computation dag  unfolds dynamically. Example:   fib(4) “ Processor   oblivious” 4 3 2 2 1 1 1 0 0
Multithreaded Computation ,[object Object],[object Object],[object Object],spawn edge return edge continue edge initial thread final thread
Cactus Stack Cilk supports C’s rule for pointers:   A pointer to stack space can be passed from parent to child, but not from child to parent.  (Cilk also supports  malloc .) Cilk’s  cactus stack  supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
LECTURE 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithmic Complexity Measures T P  =  execution time on  P  processors
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work T 1  =  span * * Also called  critical-path length  or  computational depth .
Algorithmic Complexity Measures T P  =  execution time on  P  processors T 1  =  work ,[object Object],[object Object],[object Object],* Also called  critical-path length  or  computational depth . T 1  =  span *
Speedup Definition:   T 1 /T P  =  speedup   on  P  processors. If  T 1 /T P =   ( P )  ·   P ,  we have  linear speedup ; =  P , we have  perfect linear speedup ; >  P , we have  superlinear speedup , which is not possible in our model, because of the lower bound  T P   ¸   T 1 / P .
Parallelism ,[object Object],[object Object],[object Object]
Example:  fib(4) Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 3 4 5 6 1 2 7 8 Work:   T 1  = 17
Example:  fib(4) Parallelism:   T 1 / T 1  = 2.125 Span:   T 1  = ? Work:   T 1  =  ? Assume for simplicity that each Cilk thread in  fib()  takes unit time to execute. Span:   T 1  = 8 Work:   T 1  = 17 Using many more than  2  processors makes little sense.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
Parallelizing Vector Addition C C if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { void vadd (real *A, real *B, int n){ vadd (A, B, n/2); vadd (A+n/2, B+n/2, n-n/2); ,[object Object],[object Object],void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Parallelizing Vector Addition if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { C ,[object Object],[object Object],[object Object],void vadd (real *A, real *B, int n){ cil k spawn vadd (A, B, n/2; vadd (A+n/2, B+n/2, n-n/2; spawn Side benefit:   D&C is generally good for caches! sync ; C ilk void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } } }
Vector Addition cil k   void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn  vadd (A, B, n/2); spawn  vadd (A+n/2, B+n/2, n-n/2); sync ; } }
Vector Addition Analysis ,[object Object],[object Object],[object Object],[object Object], ( n /lg  n )  ( n )  (lg  n ) BASE
Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk  void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk  void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn  vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
Analysis ,[object Object], (1)  ( n ) … …  ( n ) BASE ,[object Object],[object Object],[object Object],PUNY!
Optimal Choice of BASE ,[object Object],[object Object], ( √ n  ) … ,[object Object], ( n ) BASE … ,[object Object], (BASE +  n /BASE) ,[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scheduling ,[object Object],[object Object],[object Object],P P P Network … Memory I/O $ $ $
Greedy Scheduling ,[object Object],Definition:   A thread is  ready  if all its predecessors have  executed .
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy Scheduling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Definition:   A thread is  ready  if all its predecessors have  executed . P  = 3
Greedy-Scheduling Theorem Theorem  [Graham ’68 & Brent ’75]. Any greedy scheduler achieves T P      T 1 / P   + T  . ,[object Object],[object Object],[object Object],P  = 3
Optimality of Greedy ,[object Object],Proof .  Let  T P *  be the execution time produced by the optimal scheduler.  Since  T P *  ¸  max{ T 1 / P ,  T 1 }  (lower bounds), we have T P ·   T 1 / P  +  T 1  ·  2 ¢ max{ T 1 / P ,  T 1 } ·  2 T P *  .  ■
Linear Speedup ,[object Object],Proof.   Since  P  ¿   T 1 / T 1  is equivalent to  T 1   ¿   T 1 / P , the Greedy Scheduling Theorem gives us   T P ·   T 1 / P  +  T 1 ¼  T 1 / P  . Thus, the speedup is  T 1 / T P   ¼   P .  ■ Definition.  The quantity  ( T 1 / T 1  )/ P  is called the  parallel slackness .
Cilk Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk Chess Programs ,[object Object],[object Object],[object Object],[object Object]
 Socrates Normalized Speedup T P  =  T 1 / P  +  T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P  =  T  T P  =  T 1 / P T 1 /T P T 1 /T  P T 1 /T 
Developing   Socrates  ,[object Object],[object Object],[object Object],[object Object]
 Socrates Speedup Paradox T P      T 1 / P  +  T  T 32 = 2048/32 + 1   = 65  seconds   = 40  seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65  seconds T  32 = 40  seconds T 1 = 2048  seconds T    = 1  second T  1 = 1024  seconds T     = 8  seconds T 512 = 2048/512 + 1   = 5  seconds T  512 = 1024/512 + 8   = 10  seconds
Lesson Work  and  span  can predict performance on large machines better than running times on small machines can.
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque   of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
Cilk’s Work-Stealing Scheduler Each processor maintains a   work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Cilk’s Work-Stealing Scheduler Each processor maintains a  work deque  of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it  steals  a thread from the top of a  random  victim’s deque.
Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P      T 1 / P   + O ( T 1 ) on  P  processors. Pseudoproof . A processor is either  working  or  stealing .  The total time all processors spend working is  T 1 .  Each steal has a  1/ P  chance of reducing the span by  1 .  Thus, the expected cost of all steals is  O ( PT 1 ) .  Since there are  P  processors, the expected time is  ( T 1  +  O ( PT 1 ))/ P  =   T 1 / P  +  O ( T 1 )  .  ■
Space Bounds Theorem.   Let  S 1  be the stack space required by a serial execution of a Cilk program.  Then, the space required by a  P -processor execution is at most  S P   ·   PS 1  . Proof  (by induction). The work-stealing algorithm maintains the  busy-leaves property :  every extant procedure frame with no extant descendents has a processor working on it.   ■   P  = 3 P P P S 1
Linguistic Implications ,[object Object],for (i=1; i<1000000000; i++) { spawn  foo(i); } sync; M ORAL Better to steal parents than children!
L ECTURE  1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Key Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minicourse Outline ,[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic ProgrammingNeeldhara Misra
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsVivek Bhargav
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Dr. Pankaj Agarwal
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesHaitham El-Ghareeb
 
02 order of growth
02 order of growth02 order of growth
02 order of growthHira Gul
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm AnalyzingHaluan Irsad
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithmsDr. Rupa Ch
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to AlgorithmsVenkatesh Iyer
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsAbdullah Al-hazmy
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 

Mais procurados (19)

Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming14 - 08 Feb - Dynamic Programming
14 - 08 Feb - Dynamic Programming
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis Introduction to Algorithms Complexity Analysis
Introduction to Algorithms Complexity Analysis
 
Data Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study NotesData Structures - Lecture 8 - Study Notes
Data Structures - Lecture 8 - Study Notes
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm Analyzing
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithms
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Analysis of Algorithm
Analysis of AlgorithmAnalysis of Algorithm
Analysis of Algorithm
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Big o
Big oBig o
Big o
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis tools
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Algorithms Question bank
Algorithms Question bankAlgorithms Question bank
Algorithms Question bank
 

Semelhante a Lecture 1

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematicalbabuk110
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.pptebinazer1
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdfomchoubey297
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowQuoc-Sang Phan
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.pptSoumyaJ3
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptxKarthikVijay59
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithmDevaKumari Vijay
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxdudelover
 

Semelhante a Lecture 1 (20)

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematical
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
parallel
parallelparallel
parallel
 
008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf008. PROGRAM EFFICIENCY computer science.pdf
008. PROGRAM EFFICIENCY computer science.pdf
 
Matlab Nn Intro
Matlab Nn IntroMatlab Nn Intro
Matlab Nn Intro
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Cs2251 daa
Cs2251 daaCs2251 daa
Cs2251 daa
 
AA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptxAA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptx
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Towards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information FlowTowards an SMT-based approach for Quantitative Information Flow
Towards an SMT-based approach for Quantitative Information Flow
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
Basic_analysis.ppt
Basic_analysis.pptBasic_analysis.ppt
Basic_analysis.ppt
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
Introduction to design and analysis of algorithm
Introduction to design and analysis of algorithmIntroduction to design and analysis of algorithm
Introduction to design and analysis of algorithm
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptx
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Lecture 1

  • 1. Multithreaded Programming in Cilk L ECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
  • 2.
  • 3. Shared-Memory Multiprocessor In particular, over the next decade, chip multiprocessors (CMP’s) will be an increasingly important platform! P P P Network … Memory I/O $ $ $
  • 4.
  • 5.
  • 6.
  • 7. Fibonacci Cilk is a faithful extension of C. A Cilk program’s serial elision is always a legal implementation of Cilk semantics. Cilk provides no new data types. int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } C elision cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Cilk code
  • 8. Basic Cilk Keywords cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync; return (x+y); } } Identifies a function as a Cilk procedure , capable of being spawned in parallel. The named child Cilk procedure can execute in parallel with the parent caller. Control cannot pass this point until all spawned children have returned.
  • 9. Dynamic Multithreading cilk int fib (int n) { if (n<2) return (n); else { int x,y; x = spawn fib(n-1); y = spawn fib(n-2); sync ; return (x+y); } } The computation dag unfolds dynamically. Example: fib(4) “ Processor oblivious” 4 3 2 2 1 1 1 0 0
  • 10.
  • 11. Cactus Stack Cilk supports C’s rule for pointers: A pointer to stack space can be passed from parent to child, but not from child to parent. (Cilk also supports malloc .) Cilk’s cactus stack supports several views in parallel. B A C E D A A B A C A C D A C E Views of stack C B A D E
  • 12.
  • 13. Algorithmic Complexity Measures T P = execution time on P processors
  • 14. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work
  • 15. Algorithmic Complexity Measures T P = execution time on P processors T 1 = work T 1 = span * * Also called critical-path length or computational depth .
  • 16.
  • 17. Speedup Definition: T 1 /T P = speedup on P processors. If T 1 /T P =  ( P ) · P , we have linear speedup ; = P , we have perfect linear speedup ; > P , we have superlinear speedup , which is not possible in our model, because of the lower bound T P ¸ T 1 / P .
  • 18.
  • 19. Example: fib(4) Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 3 4 5 6 1 2 7 8 Work: T 1 = 17
  • 20. Example: fib(4) Parallelism: T 1 / T 1 = 2.125 Span: T 1 = ? Work: T 1 = ? Assume for simplicity that each Cilk thread in fib() takes unit time to execute. Span: T 1 = 8 Work: T 1 = 17 Using many more than 2 processors makes little sense.
  • 21.
  • 22. Parallelizing Vector Addition void vadd (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } C
  • 23.
  • 24.
  • 25. Vector Addition cil k void vadd (real *A, real *B, int n){ if (n<=BASE) { int i; for (i=0; i<n; i++) A[i]+=B[i]; } else { spawn vadd (A, B, n/2); spawn vadd (A+n/2, B+n/2, n-n/2); sync ; } }
  • 26.
  • 27. Another Parallelization C void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { vadd1(A+j, B+j, min(BASE, n-j)); } } Cilk cilk void vadd1 (real *A, real *B, int n){ int i; for (i=0; i<n; i++) A[i]+=B[i]; } cilk void vadd (real *A, real *B, int n){ int j; for (j=0; j<n; j+=BASE) { spawn vadd1(A+j, B+j, min(BASE, n-j)); } sync; }
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.  Socrates Normalized Speedup T P = T 1 / P + T  measured speedup 0.01 0.1 1 0.01 0.1 1 T P = T  T P = T 1 / P T 1 /T P T 1 /T  P T 1 /T 
  • 42.
  • 43.  Socrates Speedup Paradox T P  T 1 / P + T  T 32 = 2048/32 + 1 = 65 seconds = 40 seconds T  32 = 1024/32 + 8 Original program Proposed program T 32 = 65 seconds T  32 = 40 seconds T 1 = 2048 seconds T  = 1 second T  1 = 1024 seconds T   = 8 seconds T 512 = 2048/512 + 1 = 5 seconds T  512 = 1024/512 + 8 = 10 seconds
  • 44. Lesson Work and span can predict performance on large machines better than running times on small machines can.
  • 45.
  • 46. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn!
  • 47. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! Spawn!
  • 48. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 49. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Return!
  • 50. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 51. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Steal! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 52. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 53. Cilk’s Work-Stealing Scheduler Each processor maintains a work deque of ready threads, and it manipulates the bottom of the deque like a stack. P P P P Spawn! When a processor runs out of work, it steals a thread from the top of a random victim’s deque.
  • 54. Performance of Work-Stealing Theorem : Cilk’s work-stealing scheduler achieves an expected running time of T P  T 1 / P + O ( T 1 ) on P processors. Pseudoproof . A processor is either working or stealing . The total time all processors spend working is T 1 . Each steal has a 1/ P chance of reducing the span by 1 . Thus, the expected cost of all steals is O ( PT 1 ) . Since there are P processors, the expected time is ( T 1 + O ( PT 1 ))/ P = T 1 / P + O ( T 1 ) . ■
  • 55. Space Bounds Theorem. Let S 1 be the stack space required by a serial execution of a Cilk program. Then, the space required by a P -processor execution is at most S P · PS 1 . Proof (by induction). The work-stealing algorithm maintains the busy-leaves property : every extant procedure frame with no extant descendents has a processor working on it. ■ P = 3 P P P S 1
  • 56.
  • 57.
  • 58.
  • 59.