SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Where Do We Need Derivatives? 
Numerical Methods: 
Solution of ODE, DAE, Optimization, Nonlinear equations. 
Sensitivity Analysis: 
How does a computer model react to perturbations in input parame- 
ters or model constants?" 
Design Optimization: 
Choose parameters such that model computes better" design. 
Data Assimilation & Inverse Problems: 
Find values for model parameters such that model reproduces exper- 
imentally obtained results. 
Derivatives play a central role as the Taylor Series allows to 
predict the eect of changes in input parameters, e.g.: 
f(x + x)  f(x) + 
@ f 
@ x 
xT + O(jjxjj2)
Approaches to Computing Derivatives 
By Hand: 
Tedious and Error-Prone 
Divided Dierences: 
Can't assess reliability. Dicult to assess numerical accuracy (e.g., 
truncation and cancellation error) and expensive when computing 
derivatives w.r.t. many independent variables. 
one-sided dis: 
@ f(x) 
@ xi 
jx=xo  
f(xo  h  ei)  f(xo) 
h 
central dis: 
@ f(x) 
@ xi 
jx=xo  
f(xo + h  ei)  f(xo  h  ei) 
2h 
Symbolic: 
Infeasible for large codes. Not directly applicable to larger programs 
with loops and branches. (e.g., Maple, Mathematica) 
Automatic Dierentiation: 
 Requires little human time 
 Incurs no truncation error 
 Attractive computational complexity 
 Applicable to codes of arbitrary size
Hierarchical Structure of ADIFOR 
Lots of 
Alternatives 
Program 
Procedure 
Loop Nest 
Loop Body 
Basic Block 
Statement 
Expression 
ADIFOR Approach
Fortran 
Analysis 
Code 
AD Intrinsics 
Template 
Expander 
Fortran 
Derivative 
Code 
Derivative 
Computing 
Code 
The ADIFOR System 
ADIFOR 
Preprocessor 
Compile 
and Link 
AD Intrinsics 
Library 
User’s 
Derivative 
Driver 
SparsLinC 
Library 
Computational Differentiation 
at Argonne National Laboratory
ODE’s, DAE’s 
Optimization 
Iterative 
Solvers 
C, C++ 
Fortran 
(77,90,M,HPF) 
MPI,PVM 
Little 
Languages 
The Big Picture of AD Tools 
Hessians 
Non-smooth functions 
New 
Capabilities 
New 
Languages 
Chain 
Rule 
Numerical 
Methods 
Associativity 
Pseudo-Adjoints, Interface 
Contraction, Breaking Dependencies
A Modular Approach to Building AD Tools 
Input Program 
Parsing and Canonicalization Program Analysis 
Annotated 
Intermediate Representation 
Differentiation Executive 
Derivative Augmentation 
Unparsing 
Parallel Output Program 
Parallel 
Derivative 
Run-time 
System
Time-Parallel Scheme for Derivative Computing 
(FORTRAN-M Implementation) 
Chain rule associativity breaks dependencies and generates new 
task parallelism (in addition to existing one!). 
x y 
Ht Ht+1 
dH t /dx dH t + 1 /dy dH t + 2 /dz 
... Serial top-level 
Manager 
parallel_to_MM channel 
Matrix-matrix 
Master Wrapper 
Multiplier 
parallel_to_MM channel 
Gradient Process 1 
manager_to_parallel channel 
manager_to_parallel channel 
idle channel 
idle channel 
Gradient Process N 
serial_to_manager channel 
w 
y z 
z 
x 
y 
dw/dx 
proc. 0 
proc. 1 
proc. 2 
Compute_Der Compute_Fun Compute_Mat Receive Send 
7 22 36 50 65 79 94 
0 
1 
2 
3 
4 
5 
6 
7 
8
Time-Parallel Scheme for Derivative Computing 
(MPI Implementation) 
Chain rule associativity breaks dependencies and generates new 
task parallelism (in addition to existing one!). 
x y Ht Hy t+1 
x y 
x Ht H z t+1 
dH t /dx dH t + 1 /dy dH t + 2 /dz 
dw/dx 
w 
proc. 0 
proc. 1 
proc. 2 
y z 
Master Wrapper 
Manager 
(option) 
Gradient Process 1 
Matrix-matrix 
Multiplier 
Gradient Process N 
parallel_to_MM channel 
parallel_to_MM channel 
manager_to_parallel channel 
manager_to_parallel channel 
idle channel 
idle channel 
... 
Compute_Der Compute_Fun Compute_Mat Receive Send 
3.0 9.1 15.1 21.2 27.2 33.3 39.3 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9
Parallel System Design with Task Manager 
The parallel-task manager process will keep track of which pro- 
cesses are active, and select an inactive process and send an 
activations message to that process. This allows for a het- 
erogeneous compute situation, where we might have a slower 
processor. 
Compute_Der Compute_Fun Compute_Mat Receive Send 
4.9 14.6 24.3 34.0 43.7 53.4 63.1 
0 
1 
2 
3 
4 
(System Design without Task Manager) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
5.0 15.0 25.0 35.0 45.0 55.0 65.0 
0 
1 
2 
3 
4 
5 
(System Design with Task Manager) 
For the parallel resource utilization, spawning parallel gradi- 
ents computing can be done either by the round-robin scheme 
statically (top), or by introducing a task manager dynamically 
(bottom).
Parallel System Design with Task Manager 
The parallel-task manager process will keep track of which pro- 
cesses are active, and select an inactive process and send an 
activations message to that process. This allows for a het- 
erogeneous compute situation, where we might have a slower 
processor. 
Compute_Der Compute_Fun Compute_Mat Receive Send 
4.2 12.5 20.8 29.1 37.4 45.7 54.0 
0 
1 
2 
3 
4 
(System Design without Task Manager) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
4.2 12.6 21.0 29.4 37.8 46.2 54.6 
0 
1 
2 
3 
4 
5 
(System Design with Task Manager) 
For the parallel resource utilization, spawning parallel gradi- 
ents computing can be done either by the round-robin scheme 
statically (top), or by introducing a task manager dynamically 
(bottom).
Upshot: Parallel Performance Analysis 
Compute_Der Compute_Fun Compute_Mat Receive Send 
64 191 319 446 573 701 828 
0 
1 
2 
3 
4 
(ADIFOR Dense) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
65 196 326 457 587 717 848 
0 
1 
2 
3 
4 
(ADIFOR Color) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
76 228 380 533 685 837 989 
0 
1 
2 
3 
4 
(ADIFOR Sparse) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
76 227 378 529 680 831 982 
0 
1 
2 
3 
4 
(ADIFOR Mixed-1) 
Compute_Der Compute_Fun Compute_Mat Receive Send 
94 283 471 659 848 1036 1224 
0 
1 
2 
3 
4 
(ADIFOR Mixed-2)
Speedup for ADIFOR Application: 
Shallow Water Equations model (SWE) 
The serial and parallel speedup for the ShallowWater Equations 
model (SWE), which utilizes a time-dependent leapfrog scheme. 
Shallow Water Equations model (SWE) 
grid size = 21x21 n = 3*21*21 = 1323, p = 4, s = n + p = 1327 
machine: IBM SP, time-loop: 40 
160.00 
140.00 
120.00 
100.00 
80.00 
60.00 
40.00 
20.00 
0.00 
ADIFOR Serial Parallel: 1 2 4 8 16 32 
no. of derivative slaves 
Speedup 
Dense 
Color 
Sparse 
Mixed-1 
Mixed-2 
The serial speedup has been done by employing the chain rule 
and the sparsity patterns. Chain rule associativity breaks de- 
pendencies and generates new task parallelism.
ADIFOR Application: 
Shallow Water Equations model (SWE) 
The Shallow Water Equations model (SWE), which utilizes a 
time-dependent leapfrog scheme. 
We let Z(t); Z(t  1) denote the current and previous state of 
the time-dependent system. The next state is obtained by 
Z(t + 1) = G(Z(t); Z(t + 1);W;B(t + 1);Obs(t + 1)) 
where G is the time-stepping operator, W are the time- 
independent parameters, B(t + 1) are the next boundary con- 
ditions, and Obs(t + 1) are observations of the next state. 
0 
5 
10 
15 
20 
25 
0 
5 
10 
15 
20 
20 
10 
0 
−10 
−20 
−30 
−40 
−50 
25 
Shallow Water Equations model (SWE) 
0 
5 
10 
15 
20 
25 
0 
5 
10 
15 
20 
4 
2 
0 
−2 
−4 
−6 
−8 
−10 
25 
x 106 
Shallow Water Equations model (SWE) AD−Sensitivity 
4-D variational data assimilation with shallow water equations 
(SWE) when controlling both boundary and initial conditions 
(left) and its sensitivity to a uniform relative change in the 
observations and weights (right).
ADIFOR Application: MM5 PSU/NCAR 
Mesoscale Weather Model 
The Fifth-Generation Penn State/NCAR Mesoscale Weather 
Model (MM5) is regional forecasting model. See A Description 
of the Fifth-Generation Penn State/NCAR Mesoscale Weather 
Model (MM5), G. A. Grell, J. Dudhia, and D. R. Stauer, 
NCAR/TN-398+STR, 1994. 
Water vapor mass fraction (left) and its sensitivity to a uniform 
relative change in the surface pressure
eld (right).
MM5's Sensitivity to Initial Temperature 
Grid size: 63  63  23. 
Median distance of grid points: 101 km. 
Radius of perturbation: 4.6 grid points. 
Sensitivity of Temperature in deg/deg at 
time t = 0h 30min (6th time step) on the 
519 mb sigma-level.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...
 
Oh2423312334
Oh2423312334Oh2423312334
Oh2423312334
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
Caret Package for R
Caret Package for RCaret Package for R
Caret Package for R
 
Fcm1
Fcm1Fcm1
Fcm1
 
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
Learning ANSYS Fluent R19 using modeling a Fluidized Bed with nano particles
Learning ANSYS Fluent R19 using modeling a Fluidized Bed with nano particles Learning ANSYS Fluent R19 using modeling a Fluidized Bed with nano particles
Learning ANSYS Fluent R19 using modeling a Fluidized Bed with nano particles
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Data Structure and Algorithm - Divide and Conquer
Data Structure and Algorithm - Divide and ConquerData Structure and Algorithm - Divide and Conquer
Data Structure and Algorithm - Divide and Conquer
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
 

Destaque (7)

Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)Numerical Analysis (Solution of Non-Linear Equations)
Numerical Analysis (Solution of Non-Linear Equations)
 
APPLICATION OF NUMERICAL METHODS IN SMALL SIZE
APPLICATION OF NUMERICAL METHODS IN SMALL SIZEAPPLICATION OF NUMERICAL METHODS IN SMALL SIZE
APPLICATION OF NUMERICAL METHODS IN SMALL SIZE
 
bisection method
bisection methodbisection method
bisection method
 
Applications of numerical methods
Applications of numerical methodsApplications of numerical methods
Applications of numerical methods
 
Numerical method
Numerical methodNumerical method
Numerical method
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Semelhante a My Postdoctoral Research

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
fikrul islamy
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
fikrul islamy
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Derryck Lamptey, MPhil, CISSP
 
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
eArtius, Inc.
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
Naoki Shibata
 

Semelhante a My Postdoctoral Research (20)

Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
Multi-Objective Optimization of Solar Cells Thermal Uniformity Using Combined...
 
Problem-solving and design 1.pptx
Problem-solving and design 1.pptxProblem-solving and design 1.pptx
Problem-solving and design 1.pptx
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Es272 ch1
Es272 ch1Es272 ch1
Es272 ch1
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
 
Compiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisCompiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow Analysis
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

My Postdoctoral Research

  • 1. Where Do We Need Derivatives? Numerical Methods: Solution of ODE, DAE, Optimization, Nonlinear equations. Sensitivity Analysis: How does a computer model react to perturbations in input parame- ters or model constants?" Design Optimization: Choose parameters such that model computes better" design. Data Assimilation & Inverse Problems: Find values for model parameters such that model reproduces exper- imentally obtained results. Derivatives play a central role as the Taylor Series allows to predict the eect of changes in input parameters, e.g.: f(x + x) f(x) + @ f @ x xT + O(jjxjj2)
  • 2. Approaches to Computing Derivatives By Hand: Tedious and Error-Prone Divided Dierences: Can't assess reliability. Dicult to assess numerical accuracy (e.g., truncation and cancellation error) and expensive when computing derivatives w.r.t. many independent variables. one-sided dis: @ f(x) @ xi jx=xo f(xo h ei) f(xo) h central dis: @ f(x) @ xi jx=xo f(xo + h ei) f(xo h ei) 2h Symbolic: Infeasible for large codes. Not directly applicable to larger programs with loops and branches. (e.g., Maple, Mathematica) Automatic Dierentiation: Requires little human time Incurs no truncation error Attractive computational complexity Applicable to codes of arbitrary size
  • 3. Hierarchical Structure of ADIFOR Lots of Alternatives Program Procedure Loop Nest Loop Body Basic Block Statement Expression ADIFOR Approach
  • 4. Fortran Analysis Code AD Intrinsics Template Expander Fortran Derivative Code Derivative Computing Code The ADIFOR System ADIFOR Preprocessor Compile and Link AD Intrinsics Library User’s Derivative Driver SparsLinC Library Computational Differentiation at Argonne National Laboratory
  • 5. ODE’s, DAE’s Optimization Iterative Solvers C, C++ Fortran (77,90,M,HPF) MPI,PVM Little Languages The Big Picture of AD Tools Hessians Non-smooth functions New Capabilities New Languages Chain Rule Numerical Methods Associativity Pseudo-Adjoints, Interface Contraction, Breaking Dependencies
  • 6. A Modular Approach to Building AD Tools Input Program Parsing and Canonicalization Program Analysis Annotated Intermediate Representation Differentiation Executive Derivative Augmentation Unparsing Parallel Output Program Parallel Derivative Run-time System
  • 7. Time-Parallel Scheme for Derivative Computing (FORTRAN-M Implementation) Chain rule associativity breaks dependencies and generates new task parallelism (in addition to existing one!). x y Ht Ht+1 dH t /dx dH t + 1 /dy dH t + 2 /dz ... Serial top-level Manager parallel_to_MM channel Matrix-matrix Master Wrapper Multiplier parallel_to_MM channel Gradient Process 1 manager_to_parallel channel manager_to_parallel channel idle channel idle channel Gradient Process N serial_to_manager channel w y z z x y dw/dx proc. 0 proc. 1 proc. 2 Compute_Der Compute_Fun Compute_Mat Receive Send 7 22 36 50 65 79 94 0 1 2 3 4 5 6 7 8
  • 8. Time-Parallel Scheme for Derivative Computing (MPI Implementation) Chain rule associativity breaks dependencies and generates new task parallelism (in addition to existing one!). x y Ht Hy t+1 x y x Ht H z t+1 dH t /dx dH t + 1 /dy dH t + 2 /dz dw/dx w proc. 0 proc. 1 proc. 2 y z Master Wrapper Manager (option) Gradient Process 1 Matrix-matrix Multiplier Gradient Process N parallel_to_MM channel parallel_to_MM channel manager_to_parallel channel manager_to_parallel channel idle channel idle channel ... Compute_Der Compute_Fun Compute_Mat Receive Send 3.0 9.1 15.1 21.2 27.2 33.3 39.3 0 1 2 3 4 5 6 7 8 9
  • 9. Parallel System Design with Task Manager The parallel-task manager process will keep track of which pro- cesses are active, and select an inactive process and send an activations message to that process. This allows for a het- erogeneous compute situation, where we might have a slower processor. Compute_Der Compute_Fun Compute_Mat Receive Send 4.9 14.6 24.3 34.0 43.7 53.4 63.1 0 1 2 3 4 (System Design without Task Manager) Compute_Der Compute_Fun Compute_Mat Receive Send 5.0 15.0 25.0 35.0 45.0 55.0 65.0 0 1 2 3 4 5 (System Design with Task Manager) For the parallel resource utilization, spawning parallel gradi- ents computing can be done either by the round-robin scheme statically (top), or by introducing a task manager dynamically (bottom).
  • 10. Parallel System Design with Task Manager The parallel-task manager process will keep track of which pro- cesses are active, and select an inactive process and send an activations message to that process. This allows for a het- erogeneous compute situation, where we might have a slower processor. Compute_Der Compute_Fun Compute_Mat Receive Send 4.2 12.5 20.8 29.1 37.4 45.7 54.0 0 1 2 3 4 (System Design without Task Manager) Compute_Der Compute_Fun Compute_Mat Receive Send 4.2 12.6 21.0 29.4 37.8 46.2 54.6 0 1 2 3 4 5 (System Design with Task Manager) For the parallel resource utilization, spawning parallel gradi- ents computing can be done either by the round-robin scheme statically (top), or by introducing a task manager dynamically (bottom).
  • 11. Upshot: Parallel Performance Analysis Compute_Der Compute_Fun Compute_Mat Receive Send 64 191 319 446 573 701 828 0 1 2 3 4 (ADIFOR Dense) Compute_Der Compute_Fun Compute_Mat Receive Send 65 196 326 457 587 717 848 0 1 2 3 4 (ADIFOR Color) Compute_Der Compute_Fun Compute_Mat Receive Send 76 228 380 533 685 837 989 0 1 2 3 4 (ADIFOR Sparse) Compute_Der Compute_Fun Compute_Mat Receive Send 76 227 378 529 680 831 982 0 1 2 3 4 (ADIFOR Mixed-1) Compute_Der Compute_Fun Compute_Mat Receive Send 94 283 471 659 848 1036 1224 0 1 2 3 4 (ADIFOR Mixed-2)
  • 12. Speedup for ADIFOR Application: Shallow Water Equations model (SWE) The serial and parallel speedup for the ShallowWater Equations model (SWE), which utilizes a time-dependent leapfrog scheme. Shallow Water Equations model (SWE) grid size = 21x21 n = 3*21*21 = 1323, p = 4, s = n + p = 1327 machine: IBM SP, time-loop: 40 160.00 140.00 120.00 100.00 80.00 60.00 40.00 20.00 0.00 ADIFOR Serial Parallel: 1 2 4 8 16 32 no. of derivative slaves Speedup Dense Color Sparse Mixed-1 Mixed-2 The serial speedup has been done by employing the chain rule and the sparsity patterns. Chain rule associativity breaks de- pendencies and generates new task parallelism.
  • 13. ADIFOR Application: Shallow Water Equations model (SWE) The Shallow Water Equations model (SWE), which utilizes a time-dependent leapfrog scheme. We let Z(t); Z(t 1) denote the current and previous state of the time-dependent system. The next state is obtained by Z(t + 1) = G(Z(t); Z(t + 1);W;B(t + 1);Obs(t + 1)) where G is the time-stepping operator, W are the time- independent parameters, B(t + 1) are the next boundary con- ditions, and Obs(t + 1) are observations of the next state. 0 5 10 15 20 25 0 5 10 15 20 20 10 0 −10 −20 −30 −40 −50 25 Shallow Water Equations model (SWE) 0 5 10 15 20 25 0 5 10 15 20 4 2 0 −2 −4 −6 −8 −10 25 x 106 Shallow Water Equations model (SWE) AD−Sensitivity 4-D variational data assimilation with shallow water equations (SWE) when controlling both boundary and initial conditions (left) and its sensitivity to a uniform relative change in the observations and weights (right).
  • 14. ADIFOR Application: MM5 PSU/NCAR Mesoscale Weather Model The Fifth-Generation Penn State/NCAR Mesoscale Weather Model (MM5) is regional forecasting model. See A Description of the Fifth-Generation Penn State/NCAR Mesoscale Weather Model (MM5), G. A. Grell, J. Dudhia, and D. R. Stauer, NCAR/TN-398+STR, 1994. Water vapor mass fraction (left) and its sensitivity to a uniform relative change in the surface pressure
  • 16. MM5's Sensitivity to Initial Temperature Grid size: 63 63 23. Median distance of grid points: 101 km. Radius of perturbation: 4.6 grid points. Sensitivity of Temperature in deg/deg at time t = 0h 30min (6th time step) on the 519 mb sigma-level.
  • 17. ADIFOR Application: High-Speed Civil Transport MARSEN: 3-D marching Euler code - Vamshi Mohan Ko- rivi and Art Taylor, Old Dominion University, Perry Newman, NASA Langley Aerodyn. Opt. Studies using a 3-D Supersonic Euler Code with Ecient Calculation of Sensi- tivity Derivatives, V. M. Korivi, P. Newman, A. Taylor, AIAA-94-4270-CP, 1994.