SlideShare uma empresa Scribd logo
1 de 26
Abstracting Vector Architectures in Library
Generators: Case Study Convolution Filters
Alen Stojanov
ETH Zurich
Georg Ofenbeck
ETH Zurich
Tiark Rompf
EPFL, Oracle Labs
Markus Püschel
ETH Zurich
Department of Computer Science,
ETH Zurich, Switzerland
Generators for High Performance Libraries
3984 C functions
1M lines of code
Computer generated
TRANSFORMS
DFT (fwd+inv), RDFT (fwd+inv)
DCT2, DCT3, DCT4, DHT, WHT
SPIRAL: Code Generation for DSP Transforms, M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic,
Y. Voronenko, K. Chen, R. Johnson, N. Rizzolo, Proceedings of the IEEE special issue on "Program Generation, Optimization, and Adaptation,"
Vol. 93, No. 2, 2005, pp. 232-275
1
DATA TYPE
Real / Complex Numbers
Split / Interleaved Arrays
Single / Double Precision
ISA
Scalar, SSE,
LRB, AVX
CODE STYLE
Looped / Unrolled
123
1.23
•  Math knowledge: DFT, RDFT,
DHT, WHT, etc.
•  Algorithms represented in
high level array-style DSLs
•  Low Level Optimizations
TRANSFORMS
DOMAIN
NEW DOMAIN
DATA TYPE
Real / Complex Numbers
Split / Interleaved Arrays
Single / Double Precision
ISA
Scalar, SSE,
LRB, AVX
CODE STYLE
Looped / Unrolled
123
1.23
NEW DOMAIN
REDESIGN ?
Staging
ABSTRACTION
•  lower cost of domain
implementation
•  reuse at many levels
ISA
Code
Style
Data
Type
GOAL Lightweight
Modular
Staging1
1Lightweight modular staging: a pragmatic approach to runtime code
generation and compiled DSLs. T. Rompf, M. Odersky, GPCE 2010
what is staging ?
Staged
def add (!
a: Float, !
b: Float!
): Float = {!
a + b!
}!
def add (!
a: Rep[Float], !
b: Rep[Float]!
): Rep[Float] = { !
a + b!
}!
Non-Staged
execution delayed
AST created
add (2,3)	
  
add (2,3)	
  
normal CPU
execution	
  
Plus(2, 3)	
  
5	
  
2 3
Staged
def add (!
a: Float, !
b: Float!
): Float = {!
a + b!
}!
def add (!
a: Rep[Float], !
b: Rep[Float]!
): Rep[Float] = { !
a + b!
}!
Non-Staged
Float vs. Rep[Float]
LMS uses Scala Types
to control staging
T vs. Rep[T] - LMS
overloads operations on a
standard type [T] with a
staged version type Rep[T]
Staged and non-staged
code can be abstracted
using type parameters
def add [T] (!
a: T, b: T!
): T = { !
a + b!
}!
add [Rep[Float]]!
type NoRep[T] = T!
add[NoRep[Float]]!
Building generators with staging
def add (!
a: Rep[Float], !
b: Rep[Float]!
): Rep[Float] = { !
a + b!
}!
float add (!
float a, !
float b!
) = { !
return a - b;!
}!
Staging	
  
AST Rewrites:
•  Domain specific
optimizations
•  Low level
optimizations	
  
Unparsing
to C code	
  
def add (!
a: Rep[Float], !
b: Rep[Float]!
): Rep[Float] = { !
a + b * (-1)!
}!
a
b-1
a b
Abstracting Code Styles with Staging1
for (i <- 0 to n: Rep[Int]) { f(i) }!
for (i <- 0 to n: NoRep[Int]) { f(i) }!
n f(i)
for	
  
ArrayApply(a, 1)!
ArrayApply(a, 3)!
val (x1, x2) = (a(1), a(3))!
val x3 = x1 + x2!
val a: Array[Rep[T]]!val a: Rep[Array[T]]!
a	
  
x1	
  
1	
  
x2	
  
3	
   a	
   x1	
   x2	
  
a(0)	
   a(1)	
   a(2)	
   a(3)	
   a(4)	
  
Looped
Code
Unrolled
Code
Arrays of Staged Elements
(i.e. array scalarization)Staged Arrays
1Spiral in Scala: towards the systematic construction of generators for performance libraries.
G. Ofenbeck, T. Rompf, A. Stojanov M. Odersky, M. Püschel. GPCE 2013
f(0)
f(1)
f(2) f(n)
…f(3)
Scala Type Classes
used to encapsulate
implementation
abstract class NumericOps[T] {!
def plus (x: T, y: T) : T!
def minus (x: T, y: T) : T!
def times (x: T, y: T) : T!
}!
class SISDOps[T] extends NumericOps[Rep[SISD[[T]]] !
{!
type E = Rep[SISD[[T]]!
def plus (x: E, y: E) = Plus [E](x, y)!
def minus(x: E, y: E) = Minus[E](x, y)!
def times(x: E, y: E) = Times[E](x, y)!
}!
SISD (staged)
class SIMDOps[T] extends NumericOps[Rep[SIMD[[T]]]!
{ !
type E = Rep[SIMD[[T]]!
def plus (x: E, y: E) = VPlus [E](x, y)!
def minus(x: E, y: E) = VMinus[E](x, y)!
def times(x: E, y: E) = VTimes[E](x, y)!
}!
SIMD (staged, ISA independent)
abstract class ArrayOps[…] {!
def alloc (…): …!
def apply (…): …!
def update (…): …!
}!
abstract class NumericOps[T] {!
def plus (x: T, y: T) : T!
def minus (x: T, y: T) : T!
def times (x: T, y: T) : T!
}!
Scala Type Classes
used to encapsulate
implementation
abstract class ArrayOps[…] {!
def alloc (…): …!
def apply (…): …!
def update (…): …!
}!
SISD (apply, compute, store)
1
3
2
4
1
3
2
4
apply	
  
1
3
2
4
1
3
2
4
compute	
   update	
  
1
3
2
4
1
2
3
4
1
3
2
4
SIMD (apply, compute, store - ISA dependent)
1
2
3
4
1
3
2
4
1
3
2
4
apply	
   compute	
   update	
  
ready to take this to the
next level ?
abstract class CVector[V[_], E[_], R[_], P[_], T](...) {!
type Element = E[R[P[T]]]!
val nops: NumericOps[Element]!
val aops: ArrayOps[V[Element]]!
def size (): V[Int]!
def apply (i: V[Int]) : Element!
def update (i: V[Int], v: Element)!
}!
Staged /Non Staged Array
Type of Element:
Real Numbers
Complex Numbers
Staged / Non Staged
Element Primitives
SIMD / SISD
Array
Underlying primitive:
Float / Double / Int
class Interleaved_Scalar_Complex_SISD_Double_Vector extends!
CVector[NoRep, Complex, Rep, SISD, Double]!
Complete Abstraction
single
codebase
def add[V[_], E[_], R[_], P[_], T] (!
(x, y, z): Tuple3[CVector[V,E,R,P,T]]!
) = for ( i <- 0 until x.size() ) { !
z(i) = x(i) + y(i) !
}!
#define T float!
#define N 4!
void add(T* x, T* y, T* z) {!
int i = 0;!
for(; i < N; ++i) {!
T x1 = x[i];!
T y1 = y[i];!
T z1 = x1 + y1;!
z[i] = z1;!
}!
}!
#define T int!
void add(T* x, T* y, T* z) {!
T x1 = x[0]; T x2 = x[1];!
T x3 = x[2]; T x4 = x[3];!
T y1 = y[0]; T y2 = y[1];!
T y3 = y[2]; T y4 = y[3];!
T z1 = x1 + y1;!
T z2 = x2 + y2;!
T z3 = x3 + y3;!
T z4 = x4 + y4;!
z[0] = z1; z[1] = z2;!
z[2] = z3; z[3] = z4;!
}!
#define T double!
#define N 1!
void add(T* x, T* y, T* z) {!
int i = 0;!
for(; i < N; ++i) {!
__m256d x1, y1, z1;!
x1 = _mm256_loadu_pd(x + i);!
y1 = _mm256_loadu_pd(y + i);!
z1 = _mm256_add_pd(x1, y1);!
_mm256_storeu_pd(z + i, y1);!
}!
}!
#define T double!
void add(T* x, T* y, T* z) {!
__m256d x1, y1, z1;!
x1 = _mm256_loadu_pd(x + 0);!
y1 = _mm256_loadu_pd(y + 0);!
z1 = _mm256_add_pd(x1, y1);!
_mm256_storeu_pd(z + 0, y1);!
}!
Staged Loop
SISD, Floats
Unrolled Loop
SISD, Integers
Staged Loop
SIMD, Doubles
Unrolled Loop
SIMD, Doubles
type T = Float
add[Rep,Real,NoRep,SISD,T]
type T = Int
add[NoRep,Real,Rep,SISD,T]
type T = Double
add[Rep,Real,NoRep,SIMD,T]
type T = Double
add[NoRep,Real,Rep,SIMD,T]
Real Numbers,
Split Complex,
Interleaved
Complex
Looped,
Unrolled
SISD,
SIMD
Double,
Float,
Int
Abstraction Support
123
1.23
applying the
abstraction
Multi-level Tiling Σ-LL
Instruction Set Architecture
Loop Optimizations Σ-LL
Memory Layout Specialisation
Data Type Instantiation
ISA Specialization
Data Type & Memory Layout
Code Level Optimization I-IR
Optimised C code
Convolution Expression
PerformanceFeedbackLoop
FGen
Program generator for high
performance convolution
operations, based on DSLs
↓
omitted
omitted
FGen
↓
class SigmaLL2IIRTranslator[E[_], R[_], P[_], T] {!
type Element = E[R[P[T]]]!
def sum (in: List[Element]) =!
if (in.length == 1) in(0) else {!
val (m, e) = (in.length / 2, in.length)!
sum(in.slice(0, m)) + sum(in.slice(m, e))!
}!
def translate(stm: Stm) = stm match {!
case TP(y, PFIR(x, h)) =>!
val xV = List.tabulate(k)(i => x.apply(i))!
val hV = List.tabulate(k)!
(i => h.vset1(h.apply(k-i-1)))!
val tV = (xV, hV).zipped map (_*_)!
y.update(y(0), sum(tV))!
}!
}!
I-IR conversion
& SIMDification
1
2	
  
3	
  
4	
  
1
2	
  
3	
  
5	
  
6	
  
1
2	
  
3	
  
4	
  
1
1
1
1
2	
  
2	
  
2	
  
2	
  
3	
  
3	
  
3	
  
3	
  
2
3	
  
4	
  
5	
  
3
4	
  
5	
  
6	
  
↓
class SigmaLL2IIRTranslator[E[_], R[_], P[_], T] {!
type Element = E[R[P[T]]]!
def sum (in: List[Element]) =!
if (in.length == 1) in(0) else {!
val (m, e) = (in.length / 2, in.length)!
sum(in.slice(0, m)) + sum(in.slice(m, e))!
}!
def translate(stm: Stm) = stm match {!
case TP(y, PFIR(x, h)) =>!
val xV = List.tabulate(k)(i => x.apply(i))!
val hV = List.tabulate(k)!
(i => h.vset1(h.apply(k-i-1)))!
val tV = (xV, hV).zipped map (_*_)!
y.update(y(0), sum(tV))!
}!
}!
I-IR conversion
& SIMDification
123
1.23
Scalar, SSE, AVX
results
Intel(R) Xeon(R)
E5-2643 3.3 GHz
AVX, Ubuntu 13.10
Intel C++ Composer 2013.SP1.1.106
flags: -std=c99 -O3 –xHost
kernel v3.11.0-12-generic
Intel MKL v11.1.1
Intel IPP v8.0.1
Intel’s Hyper-Threading: Off
Intel Turbo Boost: Off
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2
4
6
8
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
FGen
8 Taps, Complex Numbers
IPP
MKL
Base
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2
4
6
8
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
FGen
20 Taps, Complex Numbers
IPP
MKL
Base
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
2
4
6
8
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
FGen
8 Taps, Real Numbers
IPP
MKL
Base
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2
4
6
8
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
FGen
20 Taps, Real Numbers
IPP
MKL
Base
●Base IPP v8.0.1 MKL v11.1.1 FGen
Intel(R) Core(TM) 2 Duo
CPU L7500 1.6 GHz,
SSSE3, Debian 7
Intel C++ Composer 2013.SP1.1.106
flags: -std=c99 -O3 –xHost
kernel v3.2.0-4-686-pae
Intel MKL v11.1.1
Intel IPP v8.0.1
Intel Dynamic Acceleration: Off
●Base IPP v8.0.1 MKL v11.1.1 FGen
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0
1
2
3
4
5
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
8 Taps, Complex Numbers
IPP
FGenBase
MKL
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0
1
2
3
4
5
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
20 Taps, Complex Numbers
IPP
FGen
MKL
Base
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0
1
2
3
4
5
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
8 Taps, Real Numbers
FGen
MKL
IPPBase
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0
1
2
3
4
5
2^10 2^13 2^16 2^19 2^22 2^25
Input size
Performance [f/c]
20 Taps, Real Numbers
IPP
Base
FGen
MKL
Summary
•  Generic programming coupled
with staging used to build
program generators
•  Single codebase for different code
styles, ISA and data abstraction
•  Applicable to multiple domains
•  Comparable results to commercial
libraries

Mais conteúdo relacionado

Mais procurados

Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Hsien-Hsin Sean Lee, Ph.D.
 
Computer graphics lab manual
Computer graphics lab manualComputer graphics lab manual
Computer graphics lab manualUma mohan
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationfoyez ahammad
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in indiaEdhole.com
 
Computer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsComputer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsKandarp Tiwari
 
Computer graphics lab report with code in cpp
Computer graphics lab report with code in cppComputer graphics lab report with code in cpp
Computer graphics lab report with code in cppAlamgir Hossain
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...Positive Hack Days
 
Bitwise Operations in Programming
Bitwise Operations in ProgrammingBitwise Operations in Programming
Bitwise Operations in ProgrammingSvetlin Nakov
 
10CSL67 CG LAB PROGRAM 9
10CSL67 CG LAB PROGRAM 910CSL67 CG LAB PROGRAM 9
10CSL67 CG LAB PROGRAM 9Vanishree Arun
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Hsien-Hsin Sean Lee, Ph.D.
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Hsien-Hsin Sean Lee, Ph.D.
 
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)Fizaril Amzari Omar
 
PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018Fizaril Amzari Omar
 
Name dld preparation
Name dld preparationName dld preparation
Name dld preparationPadam Rai
 
Decoder for digital electronics
Decoder for digital electronicsDecoder for digital electronics
Decoder for digital electronicsIIT, KANPUR INDIA
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programsAmit Kapoor
 

Mais procurados (20)

Lec 2 digital basics
Lec 2 digital basicsLec 2 digital basics
Lec 2 digital basics
 
Dpsd lecture-notes
Dpsd lecture-notesDpsd lecture-notes
Dpsd lecture-notes
 
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
 
Computer graphics lab manual
Computer graphics lab manualComputer graphics lab manual
Computer graphics lab manual
 
Unit 4 dica
Unit 4 dicaUnit 4 dica
Unit 4 dica
 
Digital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentationDigital Logic & Design (DLD) presentation
Digital Logic & Design (DLD) presentation
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in india
 
Computer Graphics Lab File C Programs
Computer Graphics Lab File C ProgramsComputer Graphics Lab File C Programs
Computer Graphics Lab File C Programs
 
Decoder
DecoderDecoder
Decoder
 
Computer graphics lab report with code in cpp
Computer graphics lab report with code in cppComputer graphics lab report with code in cpp
Computer graphics lab report with code in cpp
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
Bitwise Operations in Programming
Bitwise Operations in ProgrammingBitwise Operations in Programming
Bitwise Operations in Programming
 
10CSL67 CG LAB PROGRAM 9
10CSL67 CG LAB PROGRAM 910CSL67 CG LAB PROGRAM 9
10CSL67 CG LAB PROGRAM 9
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
 
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)PDT DC015 Chapter 2 Computer System 2017/2018 (f)
PDT DC015 Chapter 2 Computer System 2017/2018 (f)
 
PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018PST SC015 Chapter 2 Computer System (III) 2017/2018
PST SC015 Chapter 2 Computer System (III) 2017/2018
 
Name dld preparation
Name dld preparationName dld preparation
Name dld preparation
 
Decoder for digital electronics
Decoder for digital electronicsDecoder for digital electronics
Decoder for digital electronics
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programs
 

Semelhante a Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters

Javascript engine performance
Javascript engine performanceJavascript engine performance
Javascript engine performanceDuoyi Wu
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp fullVõ Hòa
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscationguest9006ab
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate CompilersFunctional Thursday
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.pptssuserd64918
 
Writing DSL with Applicative Functors
Writing DSL with Applicative FunctorsWriting DSL with Applicative Functors
Writing DSL with Applicative FunctorsDavid Galichet
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsMichael Pirnat
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engineDuoyi Wu
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Ur Domain Haz Monoids DDDx NYC 2014
Ur Domain Haz Monoids DDDx NYC 2014Ur Domain Haz Monoids DDDx NYC 2014
Ur Domain Haz Monoids DDDx NYC 2014Cyrille Martraire
 
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation志璿 楊
 
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016Codemotion
 
The TclQuadcode Compiler
The TclQuadcode CompilerThe TclQuadcode Compiler
The TclQuadcode CompilerDonal Fellows
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Data Con LA
 
仕事で使うF#
仕事で使うF#仕事で使うF#
仕事で使うF#bleis tift
 
Five Languages in a Moment
Five Languages in a MomentFive Languages in a Moment
Five Languages in a MomentSergio Gil
 

Semelhante a Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters (20)

Javascript engine performance
Javascript engine performanceJavascript engine performance
Javascript engine performance
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp full
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscation
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
Writing DSL with Applicative Functors
Writing DSL with Applicative FunctorsWriting DSL with Applicative Functors
Writing DSL with Applicative Functors
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engine
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Ur Domain Haz Monoids DDDx NYC 2014
Ur Domain Haz Monoids DDDx NYC 2014Ur Domain Haz Monoids DDDx NYC 2014
Ur Domain Haz Monoids DDDx NYC 2014
 
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation
 
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
How to avoid Go gotchas - Ivan Daniluk - Codemotion Milan 2016
 
The TclQuadcode Compiler
The TclQuadcode CompilerThe TclQuadcode Compiler
The TclQuadcode Compiler
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
 
仕事で使うF#
仕事で使うF#仕事で使うF#
仕事で使うF#
 
Five Languages in a Moment
Five Languages in a MomentFive Languages in a Moment
Five Languages in a Moment
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters

  • 1. Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters Alen Stojanov ETH Zurich Georg Ofenbeck ETH Zurich Tiark Rompf EPFL, Oracle Labs Markus Püschel ETH Zurich Department of Computer Science, ETH Zurich, Switzerland
  • 2. Generators for High Performance Libraries 3984 C functions 1M lines of code Computer generated TRANSFORMS DFT (fwd+inv), RDFT (fwd+inv) DCT2, DCT3, DCT4, DHT, WHT SPIRAL: Code Generation for DSP Transforms, M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, N. Rizzolo, Proceedings of the IEEE special issue on "Program Generation, Optimization, and Adaptation," Vol. 93, No. 2, 2005, pp. 232-275 1
  • 3. DATA TYPE Real / Complex Numbers Split / Interleaved Arrays Single / Double Precision ISA Scalar, SSE, LRB, AVX CODE STYLE Looped / Unrolled 123 1.23 •  Math knowledge: DFT, RDFT, DHT, WHT, etc. •  Algorithms represented in high level array-style DSLs •  Low Level Optimizations TRANSFORMS DOMAIN
  • 4. NEW DOMAIN DATA TYPE Real / Complex Numbers Split / Interleaved Arrays Single / Double Precision ISA Scalar, SSE, LRB, AVX CODE STYLE Looped / Unrolled 123 1.23
  • 6. Staging ABSTRACTION •  lower cost of domain implementation •  reuse at many levels ISA Code Style Data Type GOAL Lightweight Modular Staging1 1Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. T. Rompf, M. Odersky, GPCE 2010
  • 8. Staged def add (! a: Float, ! b: Float! ): Float = {! a + b! }! def add (! a: Rep[Float], ! b: Rep[Float]! ): Rep[Float] = { ! a + b! }! Non-Staged execution delayed AST created add (2,3)   add (2,3)   normal CPU execution   Plus(2, 3)   5   2 3
  • 9. Staged def add (! a: Float, ! b: Float! ): Float = {! a + b! }! def add (! a: Rep[Float], ! b: Rep[Float]! ): Rep[Float] = { ! a + b! }! Non-Staged Float vs. Rep[Float] LMS uses Scala Types to control staging T vs. Rep[T] - LMS overloads operations on a standard type [T] with a staged version type Rep[T] Staged and non-staged code can be abstracted using type parameters def add [T] (! a: T, b: T! ): T = { ! a + b! }! add [Rep[Float]]! type NoRep[T] = T! add[NoRep[Float]]!
  • 10. Building generators with staging def add (! a: Rep[Float], ! b: Rep[Float]! ): Rep[Float] = { ! a + b! }! float add (! float a, ! float b! ) = { ! return a - b;! }! Staging   AST Rewrites: •  Domain specific optimizations •  Low level optimizations   Unparsing to C code   def add (! a: Rep[Float], ! b: Rep[Float]! ): Rep[Float] = { ! a + b * (-1)! }! a b-1 a b
  • 11. Abstracting Code Styles with Staging1 for (i <- 0 to n: Rep[Int]) { f(i) }! for (i <- 0 to n: NoRep[Int]) { f(i) }! n f(i) for   ArrayApply(a, 1)! ArrayApply(a, 3)! val (x1, x2) = (a(1), a(3))! val x3 = x1 + x2! val a: Array[Rep[T]]!val a: Rep[Array[T]]! a   x1   1   x2   3   a   x1   x2   a(0)   a(1)   a(2)   a(3)   a(4)   Looped Code Unrolled Code Arrays of Staged Elements (i.e. array scalarization)Staged Arrays 1Spiral in Scala: towards the systematic construction of generators for performance libraries. G. Ofenbeck, T. Rompf, A. Stojanov M. Odersky, M. Püschel. GPCE 2013 f(0) f(1) f(2) f(n) …f(3)
  • 12. Scala Type Classes used to encapsulate implementation abstract class NumericOps[T] {! def plus (x: T, y: T) : T! def minus (x: T, y: T) : T! def times (x: T, y: T) : T! }! class SISDOps[T] extends NumericOps[Rep[SISD[[T]]] ! {! type E = Rep[SISD[[T]]! def plus (x: E, y: E) = Plus [E](x, y)! def minus(x: E, y: E) = Minus[E](x, y)! def times(x: E, y: E) = Times[E](x, y)! }! SISD (staged) class SIMDOps[T] extends NumericOps[Rep[SIMD[[T]]]! { ! type E = Rep[SIMD[[T]]! def plus (x: E, y: E) = VPlus [E](x, y)! def minus(x: E, y: E) = VMinus[E](x, y)! def times(x: E, y: E) = VTimes[E](x, y)! }! SIMD (staged, ISA independent) abstract class ArrayOps[…] {! def alloc (…): …! def apply (…): …! def update (…): …! }!
  • 13. abstract class NumericOps[T] {! def plus (x: T, y: T) : T! def minus (x: T, y: T) : T! def times (x: T, y: T) : T! }! Scala Type Classes used to encapsulate implementation abstract class ArrayOps[…] {! def alloc (…): …! def apply (…): …! def update (…): …! }! SISD (apply, compute, store) 1 3 2 4 1 3 2 4 apply   1 3 2 4 1 3 2 4 compute   update   1 3 2 4 1 2 3 4 1 3 2 4 SIMD (apply, compute, store - ISA dependent) 1 2 3 4 1 3 2 4 1 3 2 4 apply   compute   update  
  • 14. ready to take this to the next level ?
  • 15. abstract class CVector[V[_], E[_], R[_], P[_], T](...) {! type Element = E[R[P[T]]]! val nops: NumericOps[Element]! val aops: ArrayOps[V[Element]]! def size (): V[Int]! def apply (i: V[Int]) : Element! def update (i: V[Int], v: Element)! }! Staged /Non Staged Array Type of Element: Real Numbers Complex Numbers Staged / Non Staged Element Primitives SIMD / SISD Array Underlying primitive: Float / Double / Int class Interleaved_Scalar_Complex_SISD_Double_Vector extends! CVector[NoRep, Complex, Rep, SISD, Double]! Complete Abstraction
  • 16. single codebase def add[V[_], E[_], R[_], P[_], T] (! (x, y, z): Tuple3[CVector[V,E,R,P,T]]! ) = for ( i <- 0 until x.size() ) { ! z(i) = x(i) + y(i) ! }! #define T float! #define N 4! void add(T* x, T* y, T* z) {! int i = 0;! for(; i < N; ++i) {! T x1 = x[i];! T y1 = y[i];! T z1 = x1 + y1;! z[i] = z1;! }! }! #define T int! void add(T* x, T* y, T* z) {! T x1 = x[0]; T x2 = x[1];! T x3 = x[2]; T x4 = x[3];! T y1 = y[0]; T y2 = y[1];! T y3 = y[2]; T y4 = y[3];! T z1 = x1 + y1;! T z2 = x2 + y2;! T z3 = x3 + y3;! T z4 = x4 + y4;! z[0] = z1; z[1] = z2;! z[2] = z3; z[3] = z4;! }! #define T double! #define N 1! void add(T* x, T* y, T* z) {! int i = 0;! for(; i < N; ++i) {! __m256d x1, y1, z1;! x1 = _mm256_loadu_pd(x + i);! y1 = _mm256_loadu_pd(y + i);! z1 = _mm256_add_pd(x1, y1);! _mm256_storeu_pd(z + i, y1);! }! }! #define T double! void add(T* x, T* y, T* z) {! __m256d x1, y1, z1;! x1 = _mm256_loadu_pd(x + 0);! y1 = _mm256_loadu_pd(y + 0);! z1 = _mm256_add_pd(x1, y1);! _mm256_storeu_pd(z + 0, y1);! }! Staged Loop SISD, Floats Unrolled Loop SISD, Integers Staged Loop SIMD, Doubles Unrolled Loop SIMD, Doubles type T = Float add[Rep,Real,NoRep,SISD,T] type T = Int add[NoRep,Real,Rep,SISD,T] type T = Double add[Rep,Real,NoRep,SIMD,T] type T = Double add[NoRep,Real,Rep,SIMD,T]
  • 19. Multi-level Tiling Σ-LL Instruction Set Architecture Loop Optimizations Σ-LL Memory Layout Specialisation Data Type Instantiation ISA Specialization Data Type & Memory Layout Code Level Optimization I-IR Optimised C code Convolution Expression PerformanceFeedbackLoop FGen Program generator for high performance convolution operations, based on DSLs ↓ omitted omitted
  • 21. class SigmaLL2IIRTranslator[E[_], R[_], P[_], T] {! type Element = E[R[P[T]]]! def sum (in: List[Element]) =! if (in.length == 1) in(0) else {! val (m, e) = (in.length / 2, in.length)! sum(in.slice(0, m)) + sum(in.slice(m, e))! }! def translate(stm: Stm) = stm match {! case TP(y, PFIR(x, h)) =>! val xV = List.tabulate(k)(i => x.apply(i))! val hV = List.tabulate(k)! (i => h.vset1(h.apply(k-i-1)))! val tV = (xV, hV).zipped map (_*_)! y.update(y(0), sum(tV))! }! }! I-IR conversion & SIMDification 1 2   3   4   1 2   3   5   6   1 2   3   4   1 1 1 1 2   2   2   2   3   3   3   3   2 3   4   5   3 4   5   6   ↓
  • 22. class SigmaLL2IIRTranslator[E[_], R[_], P[_], T] {! type Element = E[R[P[T]]]! def sum (in: List[Element]) =! if (in.length == 1) in(0) else {! val (m, e) = (in.length / 2, in.length)! sum(in.slice(0, m)) + sum(in.slice(m, e))! }! def translate(stm: Stm) = stm match {! case TP(y, PFIR(x, h)) =>! val xV = List.tabulate(k)(i => x.apply(i))! val hV = List.tabulate(k)! (i => h.vset1(h.apply(k-i-1)))! val tV = (xV, hV).zipped map (_*_)! y.update(y(0), sum(tV))! }! }! I-IR conversion & SIMDification 123 1.23 Scalar, SSE, AVX
  • 24. Intel(R) Xeon(R) E5-2643 3.3 GHz AVX, Ubuntu 13.10 Intel C++ Composer 2013.SP1.1.106 flags: -std=c99 -O3 –xHost kernel v3.11.0-12-generic Intel MKL v11.1.1 Intel IPP v8.0.1 Intel’s Hyper-Threading: Off Intel Turbo Boost: Off ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] FGen 8 Taps, Complex Numbers IPP MKL Base ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] FGen 20 Taps, Complex Numbers IPP MKL Base ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] FGen 8 Taps, Real Numbers IPP MKL Base ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] FGen 20 Taps, Real Numbers IPP MKL Base ●Base IPP v8.0.1 MKL v11.1.1 FGen
  • 25. Intel(R) Core(TM) 2 Duo CPU L7500 1.6 GHz, SSSE3, Debian 7 Intel C++ Composer 2013.SP1.1.106 flags: -std=c99 -O3 –xHost kernel v3.2.0-4-686-pae Intel MKL v11.1.1 Intel IPP v8.0.1 Intel Dynamic Acceleration: Off ●Base IPP v8.0.1 MKL v11.1.1 FGen ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] 8 Taps, Complex Numbers IPP FGenBase MKL ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] 20 Taps, Complex Numbers IPP FGen MKL Base ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] 8 Taps, Real Numbers FGen MKL IPPBase ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 2^10 2^13 2^16 2^19 2^22 2^25 Input size Performance [f/c] 20 Taps, Real Numbers IPP Base FGen MKL
  • 26. Summary •  Generic programming coupled with staging used to build program generators •  Single codebase for different code styles, ISA and data abstraction •  Applicable to multiple domains •  Comparable results to commercial libraries