SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
1
Using GCC
Auto-Vectorizer
Ira Rosen <ira.rosen@linaro.org>
Michael Hope <michael.hope@linaro.org>
r1
bzr branch lp:~michaelh1/+junk/using-the-vectorizer
2
Using GCC Vectorizer
● Vectorization is enabled by the flag -ftree-vectorize and by
default at -O3:
● gcc –O2 –ftree-vectorize myloop.c
● or gcc –O3 myloop.c
● To enable NEON:
-mfpu=neon -mfloat-abi=softfp or -mfloat-abi=hard
● Information on which loops got vectorized, and which
didn’t and why:
● -fdump-tree-vect(-details)
– dumps information into myloop.c.##t.vect
● -ftree-vectorizer-verbose=[X]
– dumps to stderr
● More information:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
3
Other useful flags
● -ffast-math - if operating on floats in a reduction
computation (to allow the vectorizer to change the
order of the computation)
● -funsafe-loop-optimizations - if using "unsigned int"
loop counters (can be assumed not to overflow)
● -ftree-loop-if-convert-stores - more aggressive
if-conversion
● --param min-vect-loop-bound=[X] - if have loops with a
short trip-count
● -fno-vect-loop-version- if worried about code size
4
What's vectorizable
● Innermost loops
● countable
● no control flow
● independent data accesses
● continuous data accesses
for (k = 0; k < m; k ++)
for (j = 0; j < m; j ++)
for (i = 0; i < n; i ++)
a[k][j][i] = b[k][j][i] * c[k][j][i];
Example of not vectorizable loop:
while (a[i] != 8)
{
if (a[i] != 0)
a[i] = a[i-1];
b[i+stride] = 0;
}
uncountable
control flow
loop carried dependence
access with unknown stride
5
Special features
● vectorization of outer loops
● vectorization of straight-line code
● if-conversion
● multiple data-types and type conversions
● recognition of special idioms (e.g. dot-product,
widening operations)
● strided memory accesses
● cost model
● runtime aliasing and alignment tests
● auto-detection of vector size
Examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
6
GCC Versions
● Current Linaro GCC is based on FSF GCC 4.6
● Once FSF GCC 4.7 is released (in about six
months) Linaro GCC will switch to GCC 4.7
● Some of GCC 4.7 vectorizer related features:
● __builtin_assume_aligned – alignment hints
● vectorization of conditions with mixed types
● vectorization of bool
7
Special features
● vectorization of outer loops
● vectorization of straight-line code
● if-conversion
● multiple data-types and type conversions
● recognition of special idioms (e.g. dot-product,
widening operations)
● strided memory accesses
● cost model
● runtime aliasing and alignment tests
● auto-detection of vector size
Examples:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
8
Vectorizing for NEON
libswscale/rgb2rgb_template.c:
static inline void rgb24tobgr16_c(
const uint8_t *src, uint8_t *dst,
int src_size) {
const uint8_t *s = src;
const uint8_t *end;
uint16_t *d = (uint16_t *)dst;
end = s + src_size;
while (s < end) {
const int b = *s++;
const int g = *s++;
const int r = *s++;
*d++ = (b>>3) | ((g&0xFC)<<3)
| ((r&0xF8)<<8);
} }
.L12:
mov r1, r4
add r0, r0, #1
vld3.8 {d16, d18, d20}, [r1]!
cmp r0, r6
mov r2, ip
add r4, r4, #48
add ip, ip, #32
vld3.8 {d17, d19, d21}, [r1]
vand q12, q9, q14
vshr.u8 q11, q8, #3
vand q8, q10, q13
vshll.u8 q9, d25, #3
vmovl.u8 q10, d22
vshll.u8 q15, d24, #3
vmovl.u8 q11, d23
vorr q12, q15, q10
vorr q11, q9, q11
vshll.u8 q10, d16, #8
vshll.u8 q9, d17, #8
vorr q8, q12, q10
vorr q11, q11, q9
vst1.16 {q8}, [r2]!
vst1.16 {q11}, [r2]
bcc .L12
scalar: 75000 runs take 216.583ms
vector: 75000 runs take 48.8586ms
speedup: 4.433x
strided access
& and shift right
performed on u8
widening shift
no over-promotion to s32
9
Writing vectorizer-friendly code
● Avoid aliasing problems
– Use __restrict__ qualified pointers
void foo (int *__restrict__ pInput, int *__restrict__ pOutput)
● Don’t unroll loops
– Loop vectorization is more powerful than SLP
for (i=0; i<n; i+=4) {
sum += a[0]; for (i=0; i<n; i++)
sum += a[1]; sum += a[i];
sum += a[2];
sum += a[3];
a += 4;}
10
Writing vectorizer-friendly code (cont.)
● Use countable loops, with no side-effects
– No function-calls in the loop (distribute into a separate
loop)
for (i=0; i<n; i++)
for (i=0; i<n; i++) { if (a[i] == 0) foo();
if (a[i] == 0) foo(); for (i=0; i<n; i++)
b[i] = c[i]; } b[i] = c[i];
– No ‘break’/’continue’
for (i=0; i<n; i++)
for (i=0; i<n; i++) { if (a[i] == 8) {m = i; break;}
if (a[i] == 8) break; for (i=0; i<m; i++)
b[i] = c[i]; } b[i] = c[i];
11
Writing vectorizer-friendly code (cont.)
● Keep the memory access-pattern simple
– Don't use indirect accesses, e.g.:
for (i=0; i<n; i++)
a[b[i]] = x;
– Don't use unknown stride, e.g.:
for (i=0; i<n; i++)
a[i+stride] = x;
● Use "int" iterators rather than "unsigned int" iterators
– The C standard says that the former cannot overflow,
which helps the compiler to determine the trip count.
12
Some of our recent contributions
● Support of vldN/vstN
● NEON specific patterns: e.g. widening shift
● SLP (straight-line code vectorization)
improvements
● RTL improvements:
● reducing the number of moves and amount of
spilling (both for auto- and hand-vectorised code)
● improving modulo scheduling of NEON code
13
People
● Linaro Toolchain WG
● Ira Rosen (IRC: irar)
ira.rosen@linaro.org
– auto-vectorizer
● Richard Sandiford (IRC: rsandifo)
richard.sandiford@linaro.org
– NEON back-end/RTL optimizations
14
Helping us
Send us examples of code that are important to
you to vectorize.
15
Output Example
ex.c:
1 #define N 128
2 int a[N], b[N];
3 void foo (void)
4 {
5 int i;
6
7 for (i = 0; i < N; i++)
8 a[i] = i;
9
10 for (i = 0; i < N; i+=5)
11 b[i] = i;
12 }
● What's got vectorized:
gcc -c -O3 -ftree-vectorizer-verbose=1 ex.c
ex.c:7: note: LOOP VECTORIZED.
ex.c:3: note: vectorized 1 loops in function.
● What's got vectorized and what didn't:
gcc -c -O3 -ftree-vectorizer-verbose=2 ex.c
ex.c:10: note: not vectorized: complicated access
pattern.
ex.c:10: note: not vectorized: complicated access
pattern.
ex.c:7: note: LOOP VECTORIZED.
ex.c:3: note: vectorized 1 loops in function.
...
● All the details:
gcc -c -O3 -ftree-vectorizer-verbose=9 ex.c
or
gcc -c -O3 -fdump-tree-vect-details ex.c

Mais conteúdo relacionado

Mais procurados

C++による数値解析の並列化手法
C++による数値解析の並列化手法C++による数値解析の並列化手法
C++による数値解析の並列化手法dc1394
 
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層智啓 出川
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料一路 川染
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺MITSUNARI Shigeo
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ智啓 出川
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門NVIDIA Japan
 
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)智啓 出川
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例National Cheng Kung University
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計鍾誠 陳鍾誠
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールMITSUNARI Shigeo
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)Kentaro Ebisawa
 
メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろうKota Mizushima
 
C#でOpenCL with OpenTK + Cloo
C#でOpenCL with OpenTK + ClooC#でOpenCL with OpenTK + Cloo
C#でOpenCL with OpenTK + Clooaokomoriuta
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013Ryo Sakamoto
 

Mais procurados (20)

C++による数値解析の並列化手法
C++による数値解析の並列化手法C++による数値解析の並列化手法
C++による数値解析の並列化手法
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
 
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
2015年度GPGPU実践プログラミング 第5回 GPUのメモリ階層
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
GCC LTO
GCC LTOGCC LTO
GCC LTO
 
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ
2015年度GPGPU実践プログラミング 第15回 GPU最適化ライブラリ
 
1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門1076: CUDAデバッグ・プロファイリング入門
1076: CUDAデバッグ・プロファイリング入門
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)
2015年度GPGPU実践プログラミング 第4回 GPUでの並列プログラミング(ベクトル和,移動平均,差分法)
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
 
C/C++プログラマのための開発ツール
C/C++プログラマのための開発ツールC/C++プログラマのための開発ツール
C/C++プログラマのための開発ツール
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
メタプログラミングって何だろう
メタプログラミングって何だろうメタプログラミングって何だろう
メタプログラミングって何だろう
 
C#でOpenCL with OpenTK + Cloo
C#でOpenCL with OpenTK + ClooC#でOpenCL with OpenTK + Cloo
C#でOpenCL with OpenTK + Cloo
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
 

Destaque

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64Yi-Hsiu Hsu
 
COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1NOWAY
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELinaro
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Leon Anavi
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELinaro
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEELinaro
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Yannick Gicquel
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64Yi-Hsiu Hsu
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEELinaro
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELinaro
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewLinaro
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingMerck Hung
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelLinaro
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMThe Linux Foundation
 

Destaque (18)

Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1COMPLETE DETAIL ABOUT ARM PART1
COMPLETE DETAIL ABOUT ARM PART1
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
Software, Over the Air (SOTA) for Automotive Grade Linux (AGL)
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEE
 
Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)Introduction to Optee (26 may 2016)
Introduction to Optee (26 may 2016)
 
Introduction to armv8 aarch64
Introduction to armv8 aarch64Introduction to armv8 aarch64
Introduction to armv8 aarch64
 
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEEBKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
BKK16-110 A Gentle Introduction to Trusted Execution and OP-TEE
 
LCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEELCU14-103: How to create and run Trusted Applications on OP-TEE
LCU14-103: How to create and run Trusted Applications on OP-TEE
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting Review
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefing
 
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 KernelBUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
BUD17-DF15 - Optimized Android N MR1 + 4.9 Kernel
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
 

Semelhante a Q4.11: Using GCC Auto-Vectorizer

深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法MITSUNARI Shigeo
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfYodalee
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerPlatonov Sergey
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone? DVClub
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...corehard_by
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsRup Chowdhury
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codePVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codeAndrey Karpov
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Windows Developer
 
리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3Sangho Park
 

Semelhante a Q4.11: Using GCC Auto-Vectorizer (20)

深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Fedor Polyakov - Optimizing computer vision problems on mobile platforms Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone?
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
 
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
C++ CoreHard Autumn 2018. Concurrency and Parallelism in C++17 and C++20/23 -...
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer Graphics
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3리눅스 드라이버 실습 #3
리눅스 드라이버 실습 #3
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 

Mais de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Mais de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Q4.11: Using GCC Auto-Vectorizer

  • 1. 1 Using GCC Auto-Vectorizer Ira Rosen <ira.rosen@linaro.org> Michael Hope <michael.hope@linaro.org> r1 bzr branch lp:~michaelh1/+junk/using-the-vectorizer
  • 2. 2 Using GCC Vectorizer ● Vectorization is enabled by the flag -ftree-vectorize and by default at -O3: ● gcc –O2 –ftree-vectorize myloop.c ● or gcc –O3 myloop.c ● To enable NEON: -mfpu=neon -mfloat-abi=softfp or -mfloat-abi=hard ● Information on which loops got vectorized, and which didn’t and why: ● -fdump-tree-vect(-details) – dumps information into myloop.c.##t.vect ● -ftree-vectorizer-verbose=[X] – dumps to stderr ● More information: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 3. 3 Other useful flags ● -ffast-math - if operating on floats in a reduction computation (to allow the vectorizer to change the order of the computation) ● -funsafe-loop-optimizations - if using "unsigned int" loop counters (can be assumed not to overflow) ● -ftree-loop-if-convert-stores - more aggressive if-conversion ● --param min-vect-loop-bound=[X] - if have loops with a short trip-count ● -fno-vect-loop-version- if worried about code size
  • 4. 4 What's vectorizable ● Innermost loops ● countable ● no control flow ● independent data accesses ● continuous data accesses for (k = 0; k < m; k ++) for (j = 0; j < m; j ++) for (i = 0; i < n; i ++) a[k][j][i] = b[k][j][i] * c[k][j][i]; Example of not vectorizable loop: while (a[i] != 8) { if (a[i] != 0) a[i] = a[i-1]; b[i+stride] = 0; } uncountable control flow loop carried dependence access with unknown stride
  • 5. 5 Special features ● vectorization of outer loops ● vectorization of straight-line code ● if-conversion ● multiple data-types and type conversions ● recognition of special idioms (e.g. dot-product, widening operations) ● strided memory accesses ● cost model ● runtime aliasing and alignment tests ● auto-detection of vector size Examples: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 6. 6 GCC Versions ● Current Linaro GCC is based on FSF GCC 4.6 ● Once FSF GCC 4.7 is released (in about six months) Linaro GCC will switch to GCC 4.7 ● Some of GCC 4.7 vectorizer related features: ● __builtin_assume_aligned – alignment hints ● vectorization of conditions with mixed types ● vectorization of bool
  • 7. 7 Special features ● vectorization of outer loops ● vectorization of straight-line code ● if-conversion ● multiple data-types and type conversions ● recognition of special idioms (e.g. dot-product, widening operations) ● strided memory accesses ● cost model ● runtime aliasing and alignment tests ● auto-detection of vector size Examples: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
  • 8. 8 Vectorizing for NEON libswscale/rgb2rgb_template.c: static inline void rgb24tobgr16_c( const uint8_t *src, uint8_t *dst, int src_size) { const uint8_t *s = src; const uint8_t *end; uint16_t *d = (uint16_t *)dst; end = s + src_size; while (s < end) { const int b = *s++; const int g = *s++; const int r = *s++; *d++ = (b>>3) | ((g&0xFC)<<3) | ((r&0xF8)<<8); } } .L12: mov r1, r4 add r0, r0, #1 vld3.8 {d16, d18, d20}, [r1]! cmp r0, r6 mov r2, ip add r4, r4, #48 add ip, ip, #32 vld3.8 {d17, d19, d21}, [r1] vand q12, q9, q14 vshr.u8 q11, q8, #3 vand q8, q10, q13 vshll.u8 q9, d25, #3 vmovl.u8 q10, d22 vshll.u8 q15, d24, #3 vmovl.u8 q11, d23 vorr q12, q15, q10 vorr q11, q9, q11 vshll.u8 q10, d16, #8 vshll.u8 q9, d17, #8 vorr q8, q12, q10 vorr q11, q11, q9 vst1.16 {q8}, [r2]! vst1.16 {q11}, [r2] bcc .L12 scalar: 75000 runs take 216.583ms vector: 75000 runs take 48.8586ms speedup: 4.433x strided access & and shift right performed on u8 widening shift no over-promotion to s32
  • 9. 9 Writing vectorizer-friendly code ● Avoid aliasing problems – Use __restrict__ qualified pointers void foo (int *__restrict__ pInput, int *__restrict__ pOutput) ● Don’t unroll loops – Loop vectorization is more powerful than SLP for (i=0; i<n; i+=4) { sum += a[0]; for (i=0; i<n; i++) sum += a[1]; sum += a[i]; sum += a[2]; sum += a[3]; a += 4;}
  • 10. 10 Writing vectorizer-friendly code (cont.) ● Use countable loops, with no side-effects – No function-calls in the loop (distribute into a separate loop) for (i=0; i<n; i++) for (i=0; i<n; i++) { if (a[i] == 0) foo(); if (a[i] == 0) foo(); for (i=0; i<n; i++) b[i] = c[i]; } b[i] = c[i]; – No ‘break’/’continue’ for (i=0; i<n; i++) for (i=0; i<n; i++) { if (a[i] == 8) {m = i; break;} if (a[i] == 8) break; for (i=0; i<m; i++) b[i] = c[i]; } b[i] = c[i];
  • 11. 11 Writing vectorizer-friendly code (cont.) ● Keep the memory access-pattern simple – Don't use indirect accesses, e.g.: for (i=0; i<n; i++) a[b[i]] = x; – Don't use unknown stride, e.g.: for (i=0; i<n; i++) a[i+stride] = x; ● Use "int" iterators rather than "unsigned int" iterators – The C standard says that the former cannot overflow, which helps the compiler to determine the trip count.
  • 12. 12 Some of our recent contributions ● Support of vldN/vstN ● NEON specific patterns: e.g. widening shift ● SLP (straight-line code vectorization) improvements ● RTL improvements: ● reducing the number of moves and amount of spilling (both for auto- and hand-vectorised code) ● improving modulo scheduling of NEON code
  • 13. 13 People ● Linaro Toolchain WG ● Ira Rosen (IRC: irar) ira.rosen@linaro.org – auto-vectorizer ● Richard Sandiford (IRC: rsandifo) richard.sandiford@linaro.org – NEON back-end/RTL optimizations
  • 14. 14 Helping us Send us examples of code that are important to you to vectorize.
  • 15. 15 Output Example ex.c: 1 #define N 128 2 int a[N], b[N]; 3 void foo (void) 4 { 5 int i; 6 7 for (i = 0; i < N; i++) 8 a[i] = i; 9 10 for (i = 0; i < N; i+=5) 11 b[i] = i; 12 } ● What's got vectorized: gcc -c -O3 -ftree-vectorizer-verbose=1 ex.c ex.c:7: note: LOOP VECTORIZED. ex.c:3: note: vectorized 1 loops in function. ● What's got vectorized and what didn't: gcc -c -O3 -ftree-vectorizer-verbose=2 ex.c ex.c:10: note: not vectorized: complicated access pattern. ex.c:10: note: not vectorized: complicated access pattern. ex.c:7: note: LOOP VECTORIZED. ex.c:3: note: vectorized 1 loops in function. ... ● All the details: gcc -c -O3 -ftree-vectorizer-verbose=9 ex.c or gcc -c -O3 -fdump-tree-vect-details ex.c