SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Augmented Arithmetic Operations
Proposed for IEEE 754-2018
E. Jason Riedy
Georgia Institute of Technology
25th IEEE Symposium on Computer Arithmetic, 26 June 2018
Outline
History and Definitions
Decisions
Testing
Summary
Augmented Arith Ops — ARITH 25, 26 June 2018 1/18
History and Definitions
Quick Definitions
twoSum Two summands x and y ⇒ sum s and error e
fastTwoSum Two summands x and y with |x| ≥ |y| ⇒ sum
s and error e
twoProd Two multiplicands x and y ⇒ product p and
error e
All can overflow. Only twoProd can underflow with
gradual underflow.
Otherwise, relationships are exact for finite x and y.
These are for binary floating point.
Augmented Arith Ops — ARITH 25, 26 June 2018 2/18
Partial History
• Møller (1965), Knuth (1965): twoSum for quasi double
precision
• Kahan (1965): fastTwoSum for compensated
summation
• Dekker (1971): fastTwoSum for expansions
• Shewchuk (1997): Geometric predicates
• Brigg (1998): quasi double precision implementation
as double-double
• Hida, Li, Bailey (2001): quad-double
• Ogita, Rump, Oishi (2005...): Error-free
transformations
• IEEE 754-2008: Punted to the future
• Ahrens, Nguyen, Demmel (2013...): ReproBLAS
Augmented Arith Ops — ARITH 25, 26 June 2018 3/18
“Typical” twoSum
void
twoSum (const double x, const double y,
double * head, double * tail)
// Knuth, second volume of TAoCP
{
const double s = x + y;
const double bb = s − x;
const double err = (x − (s − bb)) + (y − bb);
*head = s;
*tail = err;
}
Six instructions, long dependency chain.
Reducing to one or two instructions: More uses.
Augmented Arith Ops — ARITH 25, 26 June 2018 4/18
twoProduct
void
twoProduct (const double a, const double b,
double *head, double *tail)
{
double p = a * b;
double t = fma (a, b, −p);
*head = p;
*tail = t;
}
Already two instructions.
With the right instruction.
Augmented Arith Ops — ARITH 25, 26 June 2018 5/18
twoSum, twoProd Unusual Exceptional Cases
x y head tail signal
∞ + ∞ ⇒ ∞ NaN invalid
−∞ + −∞ ⇒ −∞ NaN invalid
x + y ⇒ ∞ NaN invalid, overflow, inexact
(x + y overflows)
−x + −y ⇒ −∞ NaN invalid, overflow, inexact
(−x − y overflows)
−0 + −0 ⇒ −0 +0 (none)
x × y ⇒ ∞ −∞ overflow, inexact
(x × y overflows)
−x × y ⇒ −∞ ∞ overflow, inexact
(−x × y overflows)
−0 × 0 ⇒ −0 0 (none)
Augmented Arith Ops — ARITH 25, 26 June 2018 6/18
Other Orders, Other Cases
• Algebraic re-orderings produce different cases
• Different signs
• Different NaN locations
• Similarly for twoProd and fastTwoSum.
• Hence, double-double behaves strangely...
• And reprodicibility becomes tricky.
Augmented Arith Ops — ARITH 25, 26 June 2018 7/18
Reproducible Linear Algebra
• Ahrens, Nguyen, Demmel: K-fold indexed sum
• Base of the ReproBLAS, associative
• Core operation: fastTwoSum!
• But rounding must not depend on the output value.
Details:
• Demmel and Nguyen, “Fast Reproducible
Floating-Point Summation,” ARITH21, 2013.
• Ahrens, Nguyen, and Demmel, “Effcient Reproducible
Floating Point Summation and BLAS,” UCB Tech
Report, 2016.
• bebop.cs.berkeley.edu/reproblas/
Note: There are other methods, e.g. ARM’s.
Augmented Arith Ops — ARITH 25, 26 June 2018 8/18
Performance?
Emulating augmentedAddition as two instructions
improves double-double:
Operation Skylake Haswell
Addition latency −55% −45%
throughput +36% +18%
Multiplication latency −3% 0%
throughput +11% +16%
DDGEMM MFLOP/s:
Operation Intel Skylake Intel Haswell
“Typical” implementation 1732 (≈ 1/37 DP) 1199 (≈ 1/45 DP)
Two-insn augmentedAddition 3344 (≈ 1/19 DP) 2283 (≈ 1/24 DP)
Dukhan, Riedy, Vuduc. “Wanted: Floating-point add round-off error instruction,” PMAA 2016.
ReproBLAS dot product: 33% rate improvement,
only 2× slower than non-reproducible.
Augmented Arith Ops — ARITH 25, 26 June 2018 9/18
Decisions
Decisions Made for Standardization
• Time to standardize?
• Was considered for 2008, but only use was extending
precision.
• Now more uses that can benefit.
• Names?
• twoSum, etc. already exist.
• So augmented arithmetic operations.
• augmentedAddition, augmentedSubtraction,
augmentedMultiplication
• Exceptional behavior?
• My first proposal was twoSum above.
• Generally not wise...
• Rounding...
Augmented Arith Ops — ARITH 25, 26 June 2018 10/18
Standard Behavior augmentedAddition (I)
x y head tail signal
NaN NaN NaN NaN invalid on sNaN
±∞ NaN NaN NaN invalid on sNaN
NaN ±∞ NaN NaN invalid on sNaN
∞ ∞ ∞ ∞ (none)
∞ −∞ NaN NaN invalid
−∞ ∞ NaN NaN invalid
−∞ −∞ −∞ −∞ (none)
x y ∞ ∞ overrflow, inexact
(x + y overflows)
−x −y −∞ −∞ overrflow, inexact
(−x − y overflows)
Augmented Arith Ops — ARITH 25, 26 June 2018 11/18
Standard Behavior augmentedAddition (II)
x y head tail signal
+0 +0 +0 +0 (none)
+0 −0 +0 +0 (none)
−0 +0 +0 +0 (none)
−0 −0 −0 −0 (none)
Double-double, etc. look more like IEEE 754.
Augmented Arith Ops — ARITH 25, 26 June 2018 12/18
Rounding
• Reproducibility cannot round ties to nearest even.
• Depends on the output value.
• Leaves rounding ties away...
• Already exists for binary.
• Surveyed users.
• Shewchuck: triangle could not use it.
• Rump, et al.: Proofs need reworked.
• So rounding ties toward zero.
• roundTiesToZero from decimal
• Only these operations.
• sigh.
Augmented Arith Ops — ARITH 25, 26 June 2018 13/18
Testing
Testing augmentedAddition
• Assume p bits of precision, and
augmentedAddition(x, y) ⇒ (h, t).
• Typical cases are fine, only need special cases to test
for roundTiesToZero.
• Generate x, y such that y immediately follows x and
causes a tie that rounds to a value easy to check.
• Params: s sign, T significand, E exponent
• x = (−1)1−s · T · 2E with T odd
• y = (−1)1−s · 2E−p−1
• Results:
• roundTiesToZero: (x, y)
• Away or to even: (x + 2y, −y)
Augmented Arith Ops — ARITH 25, 26 June 2018 14/18
Testing augmentedMultiplication: Rounding
• Similarly need special cases to test for
roundTiesToZero and underflow.
• Generate x, y such that xy requires p + 1 bits and has
11 as the final bits.
• xm × ym = (2p+1
+ 4M + 3) · 2E
with 0 ≤ M < 2p−1
and
emin ≤ E + p − 1 < emax
• Sample M
• Factor the significand 2p+1 + 4M + 3
• Sample random exponents Exm and Eym with
emin ≤ Exm + Eym < emax
• But also need underflow...
Augmented Arith Ops — ARITH 25, 26 June 2018 15/18
Testing augmentedMultiplication: Underflow
• Testing underflow: Product needs the tail chopped
off by underflow.
• Let 0 ≤ k < p − 1 be the number of additional bits
we want below the hard underflow threshold.
• Sample:
• 0 ≤ M < 2p−1 − 1
• 0 ≤ N < 2p−k−1
• 0 ≤ T < 2k
xm ×ym =
(
(2p−1
+ M) · 2p
head
+ N · 2k
tail
+ 2T + 1
below
underflow
)
·2emin−p−k
• This case must signal inexact & underflow.
Augmented Arith Ops — ARITH 25, 26 June 2018 16/18
Reference Code
Not done yet, but...
1. Handle special cases.
2. If no ties (far case), twoSum.
3. If ties (near case), split and take care.
• Tools from the next talk.
Should be slow but clear.
Anyone feel like adding this to the RISC-V Rocket core?
Augmented Arith Ops — ARITH 25, 26 June 2018 17/18
Summary
Summary
• Three “new” recommended ops in draft IEEE
754-2018
• augmentedAddition, augmentedSubtraction,
augmentedMultiplication
• Specified to support extending precision (double
double), reproducible linear algebra
• Two instruction implementation can provide good
performance benefits.
• Quite easily testible
• Future: Decimal? Hardware?
Augmented Arith Ops — ARITH 25, 26 June 2018 18/18
fastTwoSum
void
fastTwoSum (const double x, const double y,
double * head, double * tail)
/* Assumes that |x| ≤|y| */
{
const double s = x + y;
const double bb = s − x;
const double err = y − bb;
*head = s;
*tail = err;
}
Augmented Arith Ops — ARITH 25, 26 June 2018
Standard Behavior augmentedMultiplication (I)
x y head tail signal
NaN NaN NaN NaN invalid on sNaN
±∞ NaN NaN NaN invalid on sNaN
NaN ±∞ NaN NaN invalid on sNaN
x y ∞ ∞ overflow, inexact
(x × y overflows)
−x y −∞ −∞ overflow, inexact
(−x × y overflows)
x −y −∞ −∞ overflow, inexact
(x × −y overflows)
−x −y ∞ ∞ overflow, inexact
(−x × −y overflows)
Augmented Arith Ops — ARITH 25, 26 June 2018
Standard Behavior augmentedMultiplication (II)
x y head tail signal
∞ ∞ ∞ ∞ (none)
∞ −∞ −∞ −∞ (none)
−∞ ∞ −∞ −∞ (none)
−∞ −∞ ∞ ∞ (none)
+0 +0 +0 +0 (none)
+0 −0 −0 −0 (none)
−0 +0 −0 −0 (none)
−0 −0 +0 +0 (none)
Double-double, etc. look more like IEEE 754.
Augmented Arith Ops — ARITH 25, 26 June 2018

Mais conteúdo relacionado

Semelhante a Augmented Arithmetic Operations Proposed for IEEE-754 2018

Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsMITSUNARI Shigeo
 
Relationship between some machine learning concepts
Relationship between some machine learning conceptsRelationship between some machine learning concepts
Relationship between some machine learning conceptsZoya Bylinskii
 
Complexity Analysis
Complexity Analysis Complexity Analysis
Complexity Analysis Shaista Qadir
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticityhalimuth
 
technical-mathematics-integration-17-feb_2018.pptx
technical-mathematics-integration-17-feb_2018.pptxtechnical-mathematics-integration-17-feb_2018.pptx
technical-mathematics-integration-17-feb_2018.pptxGolaotsemangLebese2
 
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfOpenWorld6
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1aravindangc
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine LearningFabian Pedregosa
 
SAT based planning for multiagent systems
SAT based planning for multiagent systemsSAT based planning for multiagent systems
SAT based planning for multiagent systemsRavi Kuril
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsAlexander Litvinenko
 
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...MITSUNARI Shigeo
 

Semelhante a Augmented Arithmetic Operations Proposed for IEEE-754 2018 (20)

QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
 
AINL 2016: Strijov
AINL 2016: StrijovAINL 2016: Strijov
AINL 2016: Strijov
 
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
 
Relationship between some machine learning concepts
Relationship between some machine learning conceptsRelationship between some machine learning concepts
Relationship between some machine learning concepts
 
MATLABgraphPlotting.pptx
MATLABgraphPlotting.pptxMATLABgraphPlotting.pptx
MATLABgraphPlotting.pptx
 
Complexity Analysis
Complexity Analysis Complexity Analysis
Complexity Analysis
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
technical-mathematics-integration-17-feb_2018.pptx
technical-mathematics-integration-17-feb_2018.pptxtechnical-mathematics-integration-17-feb_2018.pptx
technical-mathematics-integration-17-feb_2018.pptx
 
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdfCS345-Algorithms-II-Lecture-1-CS345-2016.pdf
CS345-Algorithms-II-Lecture-1-CS345-2016.pdf
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimation
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
SAT based planning for multiagent systems
SAT based planning for multiagent systemsSAT based planning for multiagent systems
SAT based planning for multiagent systems
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problems
 
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
 
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
 

Mais de Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 

Mais de Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 

Último

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 

Último (20)

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 

Augmented Arithmetic Operations Proposed for IEEE-754 2018

  • 1. Augmented Arithmetic Operations Proposed for IEEE 754-2018 E. Jason Riedy Georgia Institute of Technology 25th IEEE Symposium on Computer Arithmetic, 26 June 2018
  • 4. Quick Definitions twoSum Two summands x and y ⇒ sum s and error e fastTwoSum Two summands x and y with |x| ≥ |y| ⇒ sum s and error e twoProd Two multiplicands x and y ⇒ product p and error e All can overflow. Only twoProd can underflow with gradual underflow. Otherwise, relationships are exact for finite x and y. These are for binary floating point. Augmented Arith Ops — ARITH 25, 26 June 2018 2/18
  • 5. Partial History • Møller (1965), Knuth (1965): twoSum for quasi double precision • Kahan (1965): fastTwoSum for compensated summation • Dekker (1971): fastTwoSum for expansions • Shewchuk (1997): Geometric predicates • Brigg (1998): quasi double precision implementation as double-double • Hida, Li, Bailey (2001): quad-double • Ogita, Rump, Oishi (2005...): Error-free transformations • IEEE 754-2008: Punted to the future • Ahrens, Nguyen, Demmel (2013...): ReproBLAS Augmented Arith Ops — ARITH 25, 26 June 2018 3/18
  • 6. “Typical” twoSum void twoSum (const double x, const double y, double * head, double * tail) // Knuth, second volume of TAoCP { const double s = x + y; const double bb = s − x; const double err = (x − (s − bb)) + (y − bb); *head = s; *tail = err; } Six instructions, long dependency chain. Reducing to one or two instructions: More uses. Augmented Arith Ops — ARITH 25, 26 June 2018 4/18
  • 7. twoProduct void twoProduct (const double a, const double b, double *head, double *tail) { double p = a * b; double t = fma (a, b, −p); *head = p; *tail = t; } Already two instructions. With the right instruction. Augmented Arith Ops — ARITH 25, 26 June 2018 5/18
  • 8. twoSum, twoProd Unusual Exceptional Cases x y head tail signal ∞ + ∞ ⇒ ∞ NaN invalid −∞ + −∞ ⇒ −∞ NaN invalid x + y ⇒ ∞ NaN invalid, overflow, inexact (x + y overflows) −x + −y ⇒ −∞ NaN invalid, overflow, inexact (−x − y overflows) −0 + −0 ⇒ −0 +0 (none) x × y ⇒ ∞ −∞ overflow, inexact (x × y overflows) −x × y ⇒ −∞ ∞ overflow, inexact (−x × y overflows) −0 × 0 ⇒ −0 0 (none) Augmented Arith Ops — ARITH 25, 26 June 2018 6/18
  • 9. Other Orders, Other Cases • Algebraic re-orderings produce different cases • Different signs • Different NaN locations • Similarly for twoProd and fastTwoSum. • Hence, double-double behaves strangely... • And reprodicibility becomes tricky. Augmented Arith Ops — ARITH 25, 26 June 2018 7/18
  • 10. Reproducible Linear Algebra • Ahrens, Nguyen, Demmel: K-fold indexed sum • Base of the ReproBLAS, associative • Core operation: fastTwoSum! • But rounding must not depend on the output value. Details: • Demmel and Nguyen, “Fast Reproducible Floating-Point Summation,” ARITH21, 2013. • Ahrens, Nguyen, and Demmel, “Effcient Reproducible Floating Point Summation and BLAS,” UCB Tech Report, 2016. • bebop.cs.berkeley.edu/reproblas/ Note: There are other methods, e.g. ARM’s. Augmented Arith Ops — ARITH 25, 26 June 2018 8/18
  • 11. Performance? Emulating augmentedAddition as two instructions improves double-double: Operation Skylake Haswell Addition latency −55% −45% throughput +36% +18% Multiplication latency −3% 0% throughput +11% +16% DDGEMM MFLOP/s: Operation Intel Skylake Intel Haswell “Typical” implementation 1732 (≈ 1/37 DP) 1199 (≈ 1/45 DP) Two-insn augmentedAddition 3344 (≈ 1/19 DP) 2283 (≈ 1/24 DP) Dukhan, Riedy, Vuduc. “Wanted: Floating-point add round-off error instruction,” PMAA 2016. ReproBLAS dot product: 33% rate improvement, only 2× slower than non-reproducible. Augmented Arith Ops — ARITH 25, 26 June 2018 9/18
  • 13. Decisions Made for Standardization • Time to standardize? • Was considered for 2008, but only use was extending precision. • Now more uses that can benefit. • Names? • twoSum, etc. already exist. • So augmented arithmetic operations. • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Exceptional behavior? • My first proposal was twoSum above. • Generally not wise... • Rounding... Augmented Arith Ops — ARITH 25, 26 June 2018 10/18
  • 14. Standard Behavior augmentedAddition (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN ∞ ∞ ∞ ∞ (none) ∞ −∞ NaN NaN invalid −∞ ∞ NaN NaN invalid −∞ −∞ −∞ −∞ (none) x y ∞ ∞ overrflow, inexact (x + y overflows) −x −y −∞ −∞ overrflow, inexact (−x − y overflows) Augmented Arith Ops — ARITH 25, 26 June 2018 11/18
  • 15. Standard Behavior augmentedAddition (II) x y head tail signal +0 +0 +0 +0 (none) +0 −0 +0 +0 (none) −0 +0 +0 +0 (none) −0 −0 −0 −0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018 12/18
  • 16. Rounding • Reproducibility cannot round ties to nearest even. • Depends on the output value. • Leaves rounding ties away... • Already exists for binary. • Surveyed users. • Shewchuck: triangle could not use it. • Rump, et al.: Proofs need reworked. • So rounding ties toward zero. • roundTiesToZero from decimal • Only these operations. • sigh. Augmented Arith Ops — ARITH 25, 26 June 2018 13/18
  • 18. Testing augmentedAddition • Assume p bits of precision, and augmentedAddition(x, y) ⇒ (h, t). • Typical cases are fine, only need special cases to test for roundTiesToZero. • Generate x, y such that y immediately follows x and causes a tie that rounds to a value easy to check. • Params: s sign, T significand, E exponent • x = (−1)1−s · T · 2E with T odd • y = (−1)1−s · 2E−p−1 • Results: • roundTiesToZero: (x, y) • Away or to even: (x + 2y, −y) Augmented Arith Ops — ARITH 25, 26 June 2018 14/18
  • 19. Testing augmentedMultiplication: Rounding • Similarly need special cases to test for roundTiesToZero and underflow. • Generate x, y such that xy requires p + 1 bits and has 11 as the final bits. • xm × ym = (2p+1 + 4M + 3) · 2E with 0 ≤ M < 2p−1 and emin ≤ E + p − 1 < emax • Sample M • Factor the significand 2p+1 + 4M + 3 • Sample random exponents Exm and Eym with emin ≤ Exm + Eym < emax • But also need underflow... Augmented Arith Ops — ARITH 25, 26 June 2018 15/18
  • 20. Testing augmentedMultiplication: Underflow • Testing underflow: Product needs the tail chopped off by underflow. • Let 0 ≤ k < p − 1 be the number of additional bits we want below the hard underflow threshold. • Sample: • 0 ≤ M < 2p−1 − 1 • 0 ≤ N < 2p−k−1 • 0 ≤ T < 2k xm ×ym = ( (2p−1 + M) · 2p head + N · 2k tail + 2T + 1 below underflow ) ·2emin−p−k • This case must signal inexact & underflow. Augmented Arith Ops — ARITH 25, 26 June 2018 16/18
  • 21. Reference Code Not done yet, but... 1. Handle special cases. 2. If no ties (far case), twoSum. 3. If ties (near case), split and take care. • Tools from the next talk. Should be slow but clear. Anyone feel like adding this to the RISC-V Rocket core? Augmented Arith Ops — ARITH 25, 26 June 2018 17/18
  • 23. Summary • Three “new” recommended ops in draft IEEE 754-2018 • augmentedAddition, augmentedSubtraction, augmentedMultiplication • Specified to support extending precision (double double), reproducible linear algebra • Two instruction implementation can provide good performance benefits. • Quite easily testible • Future: Decimal? Hardware? Augmented Arith Ops — ARITH 25, 26 June 2018 18/18
  • 24. fastTwoSum void fastTwoSum (const double x, const double y, double * head, double * tail) /* Assumes that |x| ≤|y| */ { const double s = x + y; const double bb = s − x; const double err = y − bb; *head = s; *tail = err; } Augmented Arith Ops — ARITH 25, 26 June 2018
  • 25. Standard Behavior augmentedMultiplication (I) x y head tail signal NaN NaN NaN NaN invalid on sNaN ±∞ NaN NaN NaN invalid on sNaN NaN ±∞ NaN NaN invalid on sNaN x y ∞ ∞ overflow, inexact (x × y overflows) −x y −∞ −∞ overflow, inexact (−x × y overflows) x −y −∞ −∞ overflow, inexact (x × −y overflows) −x −y ∞ ∞ overflow, inexact (−x × −y overflows) Augmented Arith Ops — ARITH 25, 26 June 2018
  • 26. Standard Behavior augmentedMultiplication (II) x y head tail signal ∞ ∞ ∞ ∞ (none) ∞ −∞ −∞ −∞ (none) −∞ ∞ −∞ −∞ (none) −∞ −∞ ∞ ∞ (none) +0 +0 +0 +0 (none) +0 −0 −0 −0 (none) −0 +0 −0 −0 (none) −0 −0 +0 +0 (none) Double-double, etc. look more like IEEE 754. Augmented Arith Ops — ARITH 25, 26 June 2018