SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Extreme Computing for Extreme Adaptive Optics:
the Key to Finding Life Outside our Solar System
H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1
1Extreme Computing Research Center, KAUST, Saudi Arabia
3Steward Observatory, University of Arizona, Tucson, USA
2National Institutes of Natural Sciences, Tokyo, Japan
4National Astronomical Observatory of Japan, Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
The Subaru Telescope
Represents a flagship telescope of the National Astronomical
Observatory of Japan
Carries a 8.2-meter (320in) diameter telescope
Contains a high-contrast imaging system for directly imaging
exoplanets
Operates perhaps the most advanced HPC facility in computational
astronomy
Lives at the Mauna Kea Observatory in Hawaii
HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
(Perhaps) The Highest in Altitude GPU System Recorded
at 14,000 feet!
HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
The Atmosphere Turbulence and The Optical Aberration
HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
The Astronomical Challenge
Turbulence in the atmosphere limits the
performance of astronomical telescopes
Without active correction of such defects, images
would be blurred to approximately one arcsecond
angle
To recover the loss of angular resolution, adaptive
optics (AO) systems measure and correct
atmospheric turbulence
In the absence of optical aberrations, the telescope
should provide λ
D angular resolution (D:telescope
diameter, λ wavelength)
HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
Adaptive Optics 101
Wavefront sensor(s)
(WFS)
Measure details of
blurring from ’guide
star’ near the object
you want to observe
A real time controller
(RTC)
Processes the WFS
signals to compute
the control matrix
based on the pseudo
inverse
Light from both guide
star and astronomical
object is reflected
from deformable
mirror; distortions are
removed
https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/
HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
How AO Works?
HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
And Here Comes the Linear Algebra...
Compute the pseudo inverse A+
AA+
A = A ,A ∈ Rm×n
(m ≥ n)
The numerical challenge of the pseudo inverse are twofold:
Numerical: dealing with rectangular matrix which may engender
numerical instabilities
A A = V ΛV
Computational: high algorithmic complexity, it should still be able to
keep up with the overall throughput of the AO framework
Using SVD: A = U ΣV then:
A+
= V Σ−1
U
Only most significant singular values with their associated singular
vectors are required (≈10%)
HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
SVD Algorithm - Standard Approach (LAPACK)
This forms the SGESDD (divide and conquer algorithm) for the computation
of SVD.
Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3)
Level-2 BLAS (4/3n3)
50%flops 50%flops 6%flops
90%time 85%time 30%time
HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
Hardware Trends: Energy Matters!
2011 2018
DP FLOP 100 pJ 10 pJ
DP DRAM Read 4800 pJ 1920 pJ
Local interconnect 7500 pJ 2500 pJ
Cross system 9000 pJ 3500 pJ
John Shalf, LBNL
HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
The Big Picture (Similar w/ SVD)
Cray
LibSci17.11.1
HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
What is The Polar Decomposition?
The polar decomposition:
A = UpH , A ∈ Rm×n
(m ≥ n),
where Up is an orthogonal matrix and H =
√
A A is a symmetric positive
semidefinite matrix
The polar decomposition is a critical numerical algorithm for various
applications, including aerospace computations, chemistry, factor
analysis
HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
QDWH Polar Decomposition Algorithm
The QR-Dynamically Weighted Halley iterations:
X0 = A/α,
√
ckXk
I
=
Q1
Q2
R, Xk+1 =
bk
ck
Xk +
1
√
ck
ak −
bk
ck
Q1Q2
, k ≥ 0
The iterative procedure converges:
A = UpH,
where, UpUp = In, H is symmetric positive semidefinite
Backward stable algorithm for computing the polar decomposition
Based on conventional computational kernels, i.e., Cholesky/QR
factorizations (≤ 6 iterations for double precision) and GEMM
HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
Numerical Algorithm
Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD.
Compute the polar decomposition A = UpH using QDWH
Calculate [Q R] = QR(Up + Id)
Find the index ind = min(f ind(abs(diaд(R)) < threshold))
Extract ˜Q = Q(:,ind : end)
Reduce the original matrix problem ˜A = A × ˜Q
Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT
Compute the right singular vectors V = ˜QT × ˜V
Calculate the pseudo-inverse A+ = V Σ−1UT
HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
Algorithmic Complexity
Standard QDWH-based QDWH-based
SVD Full SVD Partial SVD
QDWH: (4+1/3)Nn3 x #itChol
Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2
complexity SVD: 22s3
Where, Nn is the matrix size, and s is the number of the selected singular values/vectors
(s << Nn)
HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
Environment Settings
Software:
GCC compilers
MAGMA v2.3 and CUDA v9.0 (including cuBLAS)
Single precision (SP) arithmetics is used
Ill-conditioned matrices generated using SLATMS MAGMA routine
Hardware:
The K80 GPU:
12GB of memory,
Two-socket 14-core system
Intel Broadwell system with 128GB of main memory
The P100 and V100 GPUs:
16GB of memory
Two-socket 16-core system
Intel Haswell systems with 128GB of main memory
HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
Synthetic Ill-Conditioned matrices, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(a) Singular Value Accuracy.
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 13% SVD
QDWHpartial, Right, 13% SVD
QDWHpartial, Left, 13% SVD, QR+PO
QDWHpartial, Right, 13% SVD, QR+PO
QDWHpartial, Left, 10% SVD
QDWHpartial, Right, 10% SVD
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 7% SVD
QDWHpartial, Right, 7% SVD
QDWHpartial, Left, 3% SVD
QDWHpartial, Right, 3% SVD
SGESDD, Left
SGESDD, Right
(b) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
Real Observational Datasets, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial
(c) Singular Value Accuracy.
1e-12
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
ResidualofSVD Matrix size
Right
Left
SGESDD, Left
SGESDD, Right
(d) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
Synthetic Ill-Conditioned matrices, K80
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(e) In Seconds.
0
500
1000
1500
2000
2500
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(f) In Gflops/s.
Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
Synthetic Ill-Conditioned matrices, P100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(g) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(h) In Gflops/s.
Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
Synthetic Ill-Conditioned matrices, V100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(i) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(j) In Gflops/s.
Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
Real Observational Datasets, V100
0.1
1
10
100
1161
5805
11610
17415
Time(s)
Matrix size
SGESDD
QDWHpartial
(k) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1161
5805
11610
17415
Gflop/s
Matrix size
QDWHpartial
SGESDD
(l) In Gflops/s.
Up to 4X speedup
HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
Conclusion and Future Work
Comprehensive accuracy/performance analysis of a novel
QDWH-based partial SVD algorithm
Significant performance improvement of the QDWH-based partial
SVD: up to 5X and 4X against state-of-the-art implementations on
synthetic ill-conditioned matrices and real datasets across various
hardware technologies
The pseudo inverse simulation code has been deployed at the Subaru
telescope and operating since June 24th 2018!
Future work includes:
Asynchronous task-based QDWH-based partial SVD implementation
Multi-GPUs QDWH-based partial SVD implementation
HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
Acknowledgments
Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan
NVIDIA GPU Research Center
Cray Center of Excellence
Intel Parallel Computing Center
HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
The largest optical/near-infrared telescope in the world.
It will weigh about 2700 tons with a main mirror diameter of 39m.
Location: Chile, South America.
H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics
Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing
one of the most challenging instruments (MOSAIC)
HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
Exciting Time for Astronomy at KAUST/ECRC!
Supporting two major worldwide ground-based astronomy efforts
The E-ELT Telescope The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
Bringing Astronomy Back Home ;-)
Courtesy from CEMSE Communications, KAUST
HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
The Hourglass Revisited
@KAUST_ECRC
https://www.facebook.com/ecrckaust
HL, DS, OG, DK QDWH-Based Partial SVD 38 / 40
Questions?
HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
Moving Forward with Extreme AO
Last N WFS
measurements
sensor 1
N x n
Last N WFS
measurements
sensor K
N x n
MVM
Last N WFS
measurements
N x n
MVM
Last WFS
measurement
n
MVM
DM state
m
DM state
m
DM state
m
Control Matrix
m x n
Predictive Control Matrix
m x ( N x n )
Sensor Fusion and
Predictive control Matrix
m x ( K x N x n )
HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40

Mais conteúdo relacionado

Semelhante a Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf
Nishant835443
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
Deltares
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
RTEFGDFGJU
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx
grssieee
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour Landscapes
James T. Metz
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
Massimo Garavaglia
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
Liang Xie, PhD
 

Semelhante a Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System (20)

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
 
16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovables
 
Be lab manual csvtu
Be lab manual csvtuBe lab manual csvtu
Be lab manual csvtu
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour Landscapes
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay NetworksDifferential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
 
Task - Surround - Ambient Lighting
 Task - Surround - Ambient Lighting Task - Surround - Ambient Lighting
Task - Surround - Ambient Lighting
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 

Mais de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mais de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

  • 1. Extreme Computing for Extreme Adaptive Optics: the Key to Finding Life Outside our Solar System H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1 1Extreme Computing Research Center, KAUST, Saudi Arabia 3Steward Observatory, University of Arizona, Tucson, USA 2National Institutes of Natural Sciences, Tokyo, Japan 4National Astronomical Observatory of Japan, Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
  • 2. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
  • 3. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
  • 4. The Subaru Telescope Represents a flagship telescope of the National Astronomical Observatory of Japan Carries a 8.2-meter (320in) diameter telescope Contains a high-contrast imaging system for directly imaging exoplanets Operates perhaps the most advanced HPC facility in computational astronomy Lives at the Mauna Kea Observatory in Hawaii HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
  • 5. The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
  • 6. (Perhaps) The Highest in Altitude GPU System Recorded at 14,000 feet! HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
  • 7. The Atmosphere Turbulence and The Optical Aberration HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
  • 8. The Astronomical Challenge Turbulence in the atmosphere limits the performance of astronomical telescopes Without active correction of such defects, images would be blurred to approximately one arcsecond angle To recover the loss of angular resolution, adaptive optics (AO) systems measure and correct atmospheric turbulence In the absence of optical aberrations, the telescope should provide λ D angular resolution (D:telescope diameter, λ wavelength) HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
  • 9. Adaptive Optics 101 Wavefront sensor(s) (WFS) Measure details of blurring from ’guide star’ near the object you want to observe A real time controller (RTC) Processes the WFS signals to compute the control matrix based on the pseudo inverse Light from both guide star and astronomical object is reflected from deformable mirror; distortions are removed https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/ HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
  • 10. How AO Works? HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
  • 11. And Here Comes the Linear Algebra... Compute the pseudo inverse A+ AA+ A = A ,A ∈ Rm×n (m ≥ n) The numerical challenge of the pseudo inverse are twofold: Numerical: dealing with rectangular matrix which may engender numerical instabilities A A = V ΛV Computational: high algorithmic complexity, it should still be able to keep up with the overall throughput of the AO framework Using SVD: A = U ΣV then: A+ = V Σ−1 U Only most significant singular values with their associated singular vectors are required (≈10%) HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
  • 12. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
  • 13. SVD Algorithm - Standard Approach (LAPACK) This forms the SGESDD (divide and conquer algorithm) for the computation of SVD. Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3) Level-2 BLAS (4/3n3) 50%flops 50%flops 6%flops 90%time 85%time 30%time HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
  • 14. Hardware Trends: Energy Matters! 2011 2018 DP FLOP 100 pJ 10 pJ DP DRAM Read 4800 pJ 1920 pJ Local interconnect 7500 pJ 2500 pJ Cross system 9000 pJ 3500 pJ John Shalf, LBNL HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
  • 15. The Big Picture (Similar w/ SVD) Cray LibSci17.11.1 HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
  • 16. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
  • 17. What is The Polar Decomposition? The polar decomposition: A = UpH , A ∈ Rm×n (m ≥ n), where Up is an orthogonal matrix and H = √ A A is a symmetric positive semidefinite matrix The polar decomposition is a critical numerical algorithm for various applications, including aerospace computations, chemistry, factor analysis HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
  • 18. QDWH Polar Decomposition Algorithm The QR-Dynamically Weighted Halley iterations: X0 = A/α, √ ckXk I = Q1 Q2 R, Xk+1 = bk ck Xk + 1 √ ck ak − bk ck Q1Q2 , k ≥ 0 The iterative procedure converges: A = UpH, where, UpUp = In, H is symmetric positive semidefinite Backward stable algorithm for computing the polar decomposition Based on conventional computational kernels, i.e., Cholesky/QR factorizations (≤ 6 iterations for double precision) and GEMM HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
  • 19. Numerical Algorithm Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD. Compute the polar decomposition A = UpH using QDWH Calculate [Q R] = QR(Up + Id) Find the index ind = min(f ind(abs(diaд(R)) < threshold)) Extract ˜Q = Q(:,ind : end) Reduce the original matrix problem ˜A = A × ˜Q Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT Compute the right singular vectors V = ˜QT × ˜V Calculate the pseudo-inverse A+ = V Σ−1UT HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
  • 20. Algorithmic Complexity Standard QDWH-based QDWH-based SVD Full SVD Partial SVD QDWH: (4+1/3)Nn3 x #itChol Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2 complexity SVD: 22s3 Where, Nn is the matrix size, and s is the number of the selected singular values/vectors (s << Nn) HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
  • 21. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
  • 22. Environment Settings Software: GCC compilers MAGMA v2.3 and CUDA v9.0 (including cuBLAS) Single precision (SP) arithmetics is used Ill-conditioned matrices generated using SLATMS MAGMA routine Hardware: The K80 GPU: 12GB of memory, Two-socket 14-core system Intel Broadwell system with 128GB of main memory The P100 and V100 GPUs: 16GB of memory Two-socket 16-core system Intel Haswell systems with 128GB of main memory HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
  • 23. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
  • 24. Synthetic Ill-Conditioned matrices, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 AccuracySingularValues Matrix size SGESDD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (a) Singular Value Accuracy. 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 13% SVD QDWHpartial, Right, 13% SVD QDWHpartial, Left, 13% SVD, QR+PO QDWHpartial, Right, 13% SVD, QR+PO QDWHpartial, Left, 10% SVD QDWHpartial, Right, 10% SVD 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 7% SVD QDWHpartial, Right, 7% SVD QDWHpartial, Left, 3% SVD QDWHpartial, Right, 3% SVD SGESDD, Left SGESDD, Right (b) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
  • 25. Real Observational Datasets, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 AccuracySingularValues Matrix size SGESDD QDWHpartial (c) Singular Value Accuracy. 1e-12 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 ResidualofSVD Matrix size Right Left SGESDD, Left SGESDD, Right (d) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
  • 26. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
  • 27. Synthetic Ill-Conditioned matrices, K80 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (e) In Seconds. 0 500 1000 1500 2000 2500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (f) In Gflops/s. Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
  • 28. Synthetic Ill-Conditioned matrices, P100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (g) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (h) In Gflops/s. Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
  • 29. Synthetic Ill-Conditioned matrices, V100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (i) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (j) In Gflops/s. Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
  • 30. Real Observational Datasets, V100 0.1 1 10 100 1161 5805 11610 17415 Time(s) Matrix size SGESDD QDWHpartial (k) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1161 5805 11610 17415 Gflop/s Matrix size QDWHpartial SGESDD (l) In Gflops/s. Up to 4X speedup HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
  • 31. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
  • 32. Conclusion and Future Work Comprehensive accuracy/performance analysis of a novel QDWH-based partial SVD algorithm Significant performance improvement of the QDWH-based partial SVD: up to 5X and 4X against state-of-the-art implementations on synthetic ill-conditioned matrices and real datasets across various hardware technologies The pseudo inverse simulation code has been deployed at the Subaru telescope and operating since June 24th 2018! Future work includes: Asynchronous task-based QDWH-based partial SVD implementation Multi-GPUs QDWH-based partial SVD implementation HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
  • 33. Acknowledgments Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan NVIDIA GPU Research Center Cray Center of Excellence Intel Parallel Computing Center HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
  • 34. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
  • 35. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) The largest optical/near-infrared telescope in the world. It will weigh about 2700 tons with a main mirror diameter of 39m. Location: Chile, South America. H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing one of the most challenging instruments (MOSAIC) HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
  • 36. Exciting Time for Astronomy at KAUST/ECRC! Supporting two major worldwide ground-based astronomy efforts The E-ELT Telescope The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
  • 37. Bringing Astronomy Back Home ;-) Courtesy from CEMSE Communications, KAUST HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
  • 39. Questions? HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
  • 40. Moving Forward with Extreme AO Last N WFS measurements sensor 1 N x n Last N WFS measurements sensor K N x n MVM Last N WFS measurements N x n MVM Last WFS measurement n MVM DM state m DM state m DM state m Control Matrix m x n Predictive Control Matrix m x ( N x n ) Sensor Fusion and Predictive control Matrix m x ( K x N x n ) HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40