Session ID: SFO17-216
Session Name: Automatic Vectorization in ART (Android RunTime) - SFO17-216
Speaker: Aart Bik - Artem Serov
Track: LMG
★ Session Summary ★
Because all modern general-purpose CPUs support small-scale SIMD
instructions (typically between 64-bit and 512-bit), modern compilers
are becoming progressively better at taking advantage of SIMD
instructions automatically, a translation often referred to as
vectorization or SIMDization. Since the Android O release, the
optimizing compiler of ART has joined the family of vectorizing
compilers with the ability to translate bytecode into native SIMD code
for the target Android device. This talk will discuss the general
organization of the retargetable part of the vectorizer, which is
capable of automatically finding and exploiting vector instructions in
bytecode without committing to one of the target SIMD architectures
yet (currently ARM NEON (advanced SIMD), x86 SSE, and MIPS SIMD
Architecture). Furthermore the talk will present particular details of
deploying the vectorizing compiler on ARM platforms - its overall
impact on performance, some ARM specific considerations and
optimizations - and also will give an update on Linaro ART team's
SIMD-related activities.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-216/
Presentation:
Video: https://www.youtube.com/watch?v=KOD5D_DjzaI
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
5. K (Android 4.4): Dalvik + JIT compiler
L (Android 5.0): ART + AOT compiler
M (Android 6.0): ART + AOT compiler
N (Android 7.0): ART + JIT/AOT compiler
O (Android 8.0): ART + JIT/AOT compiler + vectorization
8. A SIMD instruction performs a single
operation to multiple operands in parallel
ARM: NEON Technology (128-bit)
Intel: SSE* (128-bit)
AVX* (256-bit, 512-bit)
MIPS: MSA (128-bit)
All modern general-purpose CPUs support small-scale SIMD
instructions (typically between 64-bit and 512-bit)
4x32-bit operations
10. ● Many vectorizing compilers were developed by
supercomputer vendors
● Intel introduced first vectorizing compiler for SSE in 1999
● Since the Android O release, the optimizing compiler of
ART has joined the family of vectorizing compilers
www.aartbik.com
12. for (int i = 0; i < 256; i++) { for (int i = 0; i < 256; i += 4) {
a[i] = b[i] + 1; -> a[i:i+3] = b[i:i+3] + [1,1,1,1];
} }
13. Ronny Reader
Abby AuthorWendy Writer
Perry Presenter Vinny Viewer Molly Maker Casey Creator
VectorOperation
VectorMemOpVectorBinOp
VectorAdd VectorSub VectorLoad VectorStore
….
….
has alignment
has vector length
has packed data type
A class hierarchy of general vector operations that is sufficiently
powerful to represent SIMD operations common to all architectures
14. t = [1,1,1,1];
for (int i = 0; i < 256; i += 4) { -> for (int i = 0; i < 256; i += 8) {
a[i:i+3] = b[i:i+3] + [1,1,1,1]; a[i :i+3] = b[i :i+3] + t;
} a[i+4:i+7] = b[i+4:i+7] + t;
}
22. ENGINEERS AND DEVICES
WORKING TOGETHER
Java code Autovectorization result
void mul_add(int[] a,
int[] b) -{
for (int i = 0;
i < 512;
i++) {
a[i] += a[i] * b[i];
}
}
●
○
●
○
○
23. ENGINEERS AND DEVICES
WORKING TOGETHER
Java code Autovectorization result
void mul_add(int[] a,
int[] b) -{
for (int i = 0;
i < 512;
i++) {
a[i] += a[i] * b[i];
}
}
L:
cmp w0, #0x200
b.hs Exit
add w16, w1, #0xc
add x16, x16, x0, lsl #2
ld1 {v0.2s}, [x16]
add w16, w2, #0xc
add x16, x16, x0, lsl #2
ld1 {v1.2s}, [x16]
mul v1.2s, v0.2s, v1.2s
add v0.2s, v0.2s, v1.2s
add w16, w1, #0xc
add x16, x16, x0, lsl #2
st1 {v0.2s}, [x16]
add w0, w0, #0x2
ldrh w16, [tr]
cbz w16, L
●
○
●
○
○
●
○
○
●
○