ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
1. 567 3 5 C aM U DXM R Y R
5PaM OU S 5 E M OT M P 8 X dY
Fg]caSW HO O ]
BObW] O = abWbcbS ]T 5RdO QSR = Rcab WO GQWS QS O R HSQV ] ]Ug 5=GH >O O
DF5;A5 K] aV] , DS O U CQb]PS bV
1
2. C XU
m C S = ]dObW] D ObT] [ T] ORdO QW U 5= F 8
m 567=1 5= 6 WRUW U 7 ]cR = T Oab cQbc S
l O ReO S 5 QVWbSQbc S
l G]TbeO S GbOQ
l GS dWQSa
l 8ObO 7S bS :OQW WbWSa
m = bS ObW] O 7] OP] ObW]
)
3. 8 M U S
m k8SS SO W U Wa aQO OP Sn
m kDS T] [O QS Xcab USba PSbbS
WT g]c TSSR W [] S RObOn
Engine
Fuel
Large neural networks
Labeled data
(x,y pairs)
Why is Deep Learning taking off?
Rocket engines: Deep Learning driven by scale
1 million
connections
(2007)
CPU
10 million
connections
(2008)
GPU
1 billion
connections
(2011)
Cloud
(many CPUs)
100 billion
connections
(2015)
HPC
(many GPUs)
Andrew Ng (Baidu) “What Data Scientists Should Know about Deep Learning”
4. 4
C aM U DXM R Y R MPaM OU S 5 E 8
Investment for AI R&D
AI Cloud design,
construction, and operation
AI Service Developers / Providers
ML / DL Framework Developers
Big Data Holders
Collaborative R&D
IDC Vendors
Industry and
Academia
Launching business
Technology transfer
Startups
Various
AI companies
Computation Big Data
Algorithm Theory
• Common AI Platform
• Common Software Modules
• Common Data/Models
7 YY D NXUO DXM R Y
• :] 5= O a aS dWQSa O R W T Oab cQbc S RSaWU a
• 5W[W U TOab bSQV b O aTS bV ]cUV W Rcab g O R OQORS[WO Q] OP] ObW]
C <M PbM M P F R bM 5 OTU O
• e& 5= OQQS S ObW] ac ] b POaSR ] Q][[]RWbg RSdWQSa
A X U&D6&OXM FTM MNX 6US 8M M F MS
• :] 5= F 8 Q] OP] ObW]
5. 5
n K] R H] SdS Q][ cbS O R RObO ]QSaa
QO OPW Wbg
n C % D NXUO% M P 8 PUOM P W T Oab cQbc S
T] 5 6WU 8ObO 5 U] WbV[a G]TbeO S O R
5 WQObW] a
n C aM U DXM R Y b] OQQS S ObS X]W b
OQORS[WQ W Rcab g F 8 T] 5=
5= 6 WRUW U 7 ]cR = T Oab cQbc S
D MW D R YM O 3
..) D:X #:D /
,0 D:X #:D/-
9RR O Ua D R YM O 3
2 11 D:X # . U GCD.))
).- :X (J # 1 U E99B.))
D b H MS 3 4 , AJ
5a MS DH93 4 #9 UYM P
567 3 GT J XPf :U M S &FOMX
C 5 R M O
Univ. Tokyo / Kashiwa II Campus
6. GCD.)) U # ) 1
6
# System Architecture Country
Rmax
(TFlop/s)
Rpeak
(TFlop/s)
Power
(kW)
F YYU % CEB IBM, POWER9 + GPU(GV100) USA 122,300.0 187,659.3 8,806
GMUT UST % BF7J Sunway, SW26010 China 93,014.6 125,435.9 15,371
3 Sierra, LLNL IBM POWER9 + GPU(GV100) USA 71,610.0 119,193.6 -
4 Tianhe-2, NSCG NUDT, CPU + Matrix-2000 China 61,444.5 100,678.7 18,482
. ABCI, AIST Fujitsu, CPU + GPU(V100) Japan 19,880.0 32,576.6 1,649
/ Piz Daint, CSCS Cray, CPU + GPU(P100) Switzerland 19,590.0 25,326.3 2,272
0 Titan, ORNL Cray, CPU + GPU(K20x) USA 17,590.0 27,112.5 8,209
1 Sequoia, LLNL IBM, BlueGene/Q USA 17,173.2 20,132.7 7,890
2 Trinity, LANL/SNL Cray, MIC(KNL) USA 14,137.3 43,902.6 3,844
) Cori, NERSC-LBNL Cray, MIC(KNL) USA 14,014.7 27,880.7 3,939
7. AI Infrastructure for Everyone (Democratization AI)
D Y U S T Nd a )) U U U
M P a ))) M OT ( SU
6cW RW U SQ]agabS[
]T 5= F 8 T ][
PSUW S a b] abObS ]T
bVS O b SaSO QVSa
Expert
• Up to 512-node computing resource is available for everyone
• Software, datasets, and pre-trained models are ready for use
• High computing capability enables to accelerate AI R&D and
promote social implementation
Advanced & Intermediate
Beginner
• User friendly WebUI based IDE for supporting
beginners of deep learning
• PoC platform of B2B2C
• ABCI Grand Challenge: Demonstration of highly challenging academic
and/or industrial themes using the whole ABCI resources for 24 hours
• ABCI Data Challenge: Competition of accelerating open science using
open data
7
A reference design
Technology transfer
IDC
8. AI Infrastructure by co-Design
8
<UST 8 U d 8M MO 8 US
• 7]ab STTSQbWdS WUVbeSWUVb keO SV]caSn PcW RW U O R Q]] W U ]R
• c ) T YMX P U d R M PM P 87
HX M
• : SS Q]] W U O R VWUV STTWQWS Qg ]eS ac WSa SbQ
• 7 YY PU UeU S O Y O XU S OT X SU 7X P #0) J( MOW
8 RMO M P 7 YY PU d 5 OTU O
• KWRS O UW U 6WU 8ObO O R D7 abO RO R a]TbeO S abOQ a
• 5= OQQS S Ob] ac ] b POaSR ] Q][[]RWbg a]TbeO S
7 MU & UYUe P F R bM 9O d Y
• 5RdO QSR Q ]cR POaSR ] S ObW]
• 7] dS US QS ]T D7 O R 5=&6WU 8ObO a]TbeO S abOQ
F O 8M M H UXUeM U
• D ][]bW U W Rcab WO F 8 caW U WdObS RObO aSba
9. 9
Gateway and Firewall
Interactive Nodes x 4
Interconnect (Infiniband EDR)
Service Network (10GbE)
Management and Gateway Nodes x 15
High-Performance Computing System
550 PFlops(FP16), 37.2 PFlops(FP64)
476 TiB Memory, 1.74 PB NVMe SSD
Computing Nodes (w/ GPU) x 1088
Multi-platform Nodes (w/o GPU) x 10
• Intel Xeon Gold 6132 (2.6GHz/14cores) x 2
• 768GiB Memory, 3.8TB NVMe SSD
DDN SFA14K (w/ SS8462 Enclosure x 10) x 3
• 12TB 7.2Krpm NL-SAS HDD x 2400
• 3.84TB SAS SSD x 216
• NSD Server x 12
D O X B P % O c 1
• Mellanox CS7500 x 2
• Mellanox SB7890 x 229
• Nexsus 3232C x2
• FortiGate 1500D x2
• FortiAnalyzer 400E x1
100Gbps
SINET5
GPU NVIDIA Tesla V100 SXM2 x 4
CPU Intel Xeon Gold 6148 (2.4GHz/20cores) x 2
Memory 384GiB
Local Storage Intel SSD DC P4600 (NVMe) 1.6TB x 1
Interconnect InfiniBand EDR x 2
ABCI Hardware Overview
Large-scale Storage System
22 PB GPFS
10. 10
567 7 Y U S EMOW ( O O
InfiniBand EDR x1
InfiniBand EDR x6
InfiniBand EDR x4
Rack #1
LEAF#1
SB7890
LEAF#2
SB7890
LEAF#3
SB7890
LEAF#4
SB7890
CX400
#1
CX2570#1
CX2570#2
CX400
#2
CX2570#3
CX2570#4
CX400
#3
CX2570#5
CX2570#6
CX400
#17
CX2570#33
CX2570#34
FBB#1
SB7890
FBB#2
SB7890
FBB#3
SB7890
Full bisection BW
IB-EDR x 72
Rack #2
LEAF#1
SB7890
LEAF#2
SB7890
LEAF#3
SB7890
LEAF#4
SB7890
CX400
#1
CX2570#1
CX2570#2
CX400
#2
CX2570#3
CX2570#4
CX400
#3
CX2570#5
CX2570#6
CX400
#17
CX2570#33
CX2570#34
FBB#1
SB7890
FBB#2
SB7890
FBB#3
SB7890
SPINE#1
CS7500
SPINE#2
CS7500
Full bisection BW
IB-EDR x 72
1/3 Oversubscription BW
IB-EDR x 24
n 8 & MOWMS P MOW3 ,- P % ,/ G XM I ))
• HVS] SbWQO SO S T] [O QS S OQ o
( (- D: ] a :D- (. D: ] a :D(-
Q T ;]]U S HDI D]R 4( D: ] a
• D]eS Q] ac[ bW] S OQ 1 -. K
n O O
• :Ob b SS b] ] ]Ug
• = b O OQ 1 Tc PWaSQbW] 6K
• = bS OQ o(& ]dS acPaQ W bW] ) &-/
• O US aQO S ab] OUS agabS[1
Tc PW aSQbW] 6K
11. 7 MU U O U UOMX R 5 E 8
((
m O U) ]U O[ STS SR W 57 ) (.
l Vbb a1&&UWbVcP Q][& S dW Ucc& O U) ]U O[
l D ]dWRSR Oa O 8]Q S TW S
l 6c QV ]T a]TbeO S SSRSR b] PS c
m HS a] T ]e D]abU SaE DgbV] DW DOQ OUSa SbQ
m =b Wa ]b ]aaWP S b] c Wb ] b ORWbW] O
O US aQO S D7 agabS[a
l 5 PWb O g&J] c bO g W abO ObW] ]T a]TbeO S
34 QVO]a
l 8]Q S
34 aSQc Wbg SOa]
m KSi S RSdS ] W U O SOag b] [O OUS O R
T SfWP S b] caS ObT] [ T] RS ]gW U 5=
O a Oa k[]Rc San
12. 5 : MY b W A P X
()
m ]ab CG ] g ]dWRSa bVS [W W[c[
aSb ]T a]TbeO S W Q cRW U1
l 6OaS R WdS a WP O WSa
m k6OaSn []Rc Sa ]dWRS Qcab][WhSR
CG W[OUSa W Q cRW U1
l 7I85 Qc8BB B77 AD= SbQ
m k8 n []Rc Sa ]dWRS RSS SO W U
T O[Se] a O R O a eVWQV SfbS R
k6OaSn W[OUSa
m KS [OW g S[ ]g GW Uc O Wbg Oa bVS
POaWa ]T ]c 5= ObT] [
6M
6OaS CG =[OUSa
W Q 7I85 7c8BB B77 AD= = TW WPO R
8
=[OUSa T] 8SS SO W U : O[Se] a
POaSR ] D76OaS
7 CFHN 8 NUM FHF9: P M
7MRR 7TMU G :X b AKB G OT
< CF
W Q 6OaS 8 WdS a WP O WSa
DH RU UNM P D:FAD
13. 13
ABCI Software Stack
F R bM
Operating System CentOS, RHEL
Job Scheduler Univa Grid Engine
Container Engine Docker, Singularity
MPI OpenMPI, MVAPICH2, Intel MPI
Development tools Intel Parallel Studio XE Cluster Edition, PGI Professional Edition, NVIDIA CUDA SDK, GCC, Python, Ruby, R, Java, Scala, Perl
Deep Learning Caffe, Caffe2, TensorFlow, Theano, Torch, PyTorch, CNTK, MXnet, Chainer, Keras, etc.
Big Data Processing Hadoop, Spark
7 MU
• Containers enable users to instantly try the state-of-the-art software
developed in AI community
• ABCI supports two container technologies
• Docker, having a large user community
• Singularity, recently accepted HPC community
• ABCI provides various single-node/distributed deep learning framework
container images optimized to achieve high performance on ABCI
14. 14
ABCI Cloud Services
F aUO Gd
Service Type Description #Nodes (Min./Max.)
Spot Batch job service 1 / 512
On-demand Interactive job service 1 / 32
Reserved Advanced reservation service 1 / 32
Group Storage Shared storage service N/A
r567= ]dWRSa PObQV O R W bS OQbWdS bg S X]P SfSQcbW] aS dWQS T] [OfW[WhW U bV ]cUV cb
ORdO QSR SaS dObW] aS dWQS T] RSRWQObSR caS ]T ]RSa =89 O R ab] OUS aS dWQSa
Univa Resource Allocation
Singularity
Natively installed
software
ex-Univa allocation
Shared DedicatedCompute Nodes
Resource Alloc.
Job Bootstrap DockerUniva Docker
System-provided /
User-defined containers
User-defined
containersApplications
Spot / On-demand DedicatedReservedService Type
15. 15
ABCI Cloud Services
Resource Types
Gd
7DH O
5 US ( G MX
DH
5 US ( G MX
A Y d # 6
5 US ( G MX
F MS #G6
5 US ( G MX
: #: XX B P & & - & / ( & ( -
XM S ) & & ) & / . & ( -
YMXX , & ( & - & / (., & ( -
7 XM S ) & & () & / . & ( -
7 YMXX , & & & / (., & ( -
rIaS a QO QV]]aS bVS []ab acWbOP S Q][ cbW U W abO QS T ][ TWdS Sa]c QS bg Sa
7DI
7DI(
;DI
;DI(
;DI)
;DI
7DI
7DI(
;DI
;DI(
;DI)
;DI
7 a[O ; a[O
; O US 7 O US
16. 567 H 7M 3 8U UN P 8 M U S
m - B]RSa ),- ;DIa W ( V]c
m 8 : O[Se] 1 7VOW S AB d(
m 8ObO GSb1 =[OUSBSb (
m A]RS 1 FSaBSb ,
l 6ObQV aWhS1 ) f ),-
l SO W U FObS1 abO bW U ( O R
f ( Ob - 0 S ]QV e& eO [
c aQVSRc W U
l A][S bc[ G;8 [][S bc[3 0
l KSWUVb RSQOg1 (
l H OW W U 9 ]QV1 (
16
Performance Reference for Distributed Deep Learnin
Better• Environments
– ABCI 64 nodes (256 GPUs)
– Framework: ChainerMN v1.3.0
• Chainer 4.2.0, Cupy 4.2.3, mpi4py 3.0.0, Python 3.6.5
– Baremetal
• CentOS 7.4, gcc-4.8.5,
CUDA 9.2, CuDNN 7.1.4, NCCL2.2, OpenMPI 2,1.3
• Settings
– Dataset: Imagenet-1K
– Model: ResNet-50
– Training:
• Batch size: 32 per GPU, 32 x 256 in total
• Learning Rate: Starting 0.1 and x0.1 at 30, 60, 80 epoch
w/ warm up scheduling
• Optimization: Momentum SGD (momentum=0.9)
• Weight Decay: 0.0001
• Training Epoch: 100
Better
Training is finished
within only one hour!
17. 17
5 8M MO
g7 YY PU UeU S O Y O XU S OT X SU 7X P
#0)WJ( MOW h
<UST I X MS G M R Y
#, .AJ
9c U F MO
HDF
0 EMOW
1 EMOW
ib( HDFj
9c U F MO
5O Ua
7TUXX
7]] W U
QO OQWbg1
) K
DM Ua
7 XU S
G b
7]] W U
QO OQWbg1
AK
7 XU S D P
GS dS F]][1 (0[ f ) [ f -[
CcbR]] :OQW WbWSa
n FU SX RX % O RR O Ua N UXPU S
n <M P O O RX (Y b UST
X M O R MOW M P O XU S P
n B YN R EMOW
• = WbWO 1 0 567= caSa ( OQ a
• AOf1 (
n D b OM MOU d3 , . AJ
• 567= caSa ) AK [Of
n 7 XU S OM MOU d3 , AJ
• . K& OQ 1 - K eObS p( K OW
18. ))! : 7 XU S a T 9 U L M
m : SS Q]] W U caW U O OaaWdS
Q]] W U b]eS
– ⭕ ]e CD9L B] OQbWdS QVW S
[SQVO WQO ST WUS ObW]
– ❌ ;S S ObSR eObS bS[ S Obc S Wa
RS S RSR ] bVS SfbS O eSObVS
bS[ O R Vc[WRWbg
m gP WR eObS &OW Q]] W U eWbV
VWUV bS[ S Obc S Q]] W U eObS
)q
(/
https://www.sinko.co.jp/product/technical-column/09/
Warm and Wet
Air Out
Cold and Dry
Air In
Cold Water
Out
Hot Water
In
19. 19
<dN UP JM (5U 7 XU S Fd Y
:M 7 UX H U #:7H
7 XU S G b
78H
JM 6X OW
5Wq
5W,q
5U 7 XU S
Cooling Pod / Rack
Computing Node
JM
-)k
JM
, k
JM 7 XU S
1st closed
circuit
2nd closed
circuit
Outdoor Facilities
20. F aUa P T c Y T M TU YY
)
47
/2
170586473 :
170586473
KObSHS[SObcSMqN
7 XU S JM #: Y 567 3
4-)k
7 XU S JM #G 567 3
4, k
)q
: SS Q]] W U Wa ]aaWP S Rc W U
bVS TW ab U O R QVO S US ] ))
)- >c g ) (/
Max./min. temperature in Kashiwa in July 2018
21. 21
7 XU S D P
Capping Wall
Capping Wall
Fan Coil Unit
Server Rack
Water Circuit
Fan Coil Unit
Server Rack
48UHot Aisle
7 XP 5U X
,.k
< 5U X -)k
Front view Side view
m A]Rc O WhObW] ]T 87 TOQW WbWSa1
OQ a OQS eObS &OW Q]] W U
O R ]eS S cW [S b
m B] OWaSR T ]] O R a S Sb]
T O[S PcW b ] Q] Q SbS a OP
m 9TTSQbWdS OW Q]] W U Pg V]b
OWa S Q] bOW [S b
m 9OaS ]T [OW bS O QS
22. 22
UP 7 XU S D P
7ObeO T] [OW bS O QS
]b OWa S
Capping Wall
Capping Wall
Fan Coil Unit
Server Rack
Water Circuit
Fan Coil Unit
Server Rack
48UHot Aisle
5.4m
ts
6.0m
23. 23
7 O U R 7 XU S D P
February 6, 2018 March 21, 2018
25. 567 &DMOURUO E M OT DXM R Y #DED
m CPXSQbWdS1 9abOP WaV O Q ]aS e] W U S ObW] aVW eWbV Sa SQb b] bVS
SaSO QV SRcQObW] O R O WQObW] ]T aQWS bWTWQ ]e SRUS W 5= O R
[] S P ]OR g W RObO W bS aWdS aQWS QS O R ]P]bWQa
m BSe , gSO A]I Q][ SbSR PSbeSS 5=GH O R I7G8
m 5QbWdWbWSa W Q cRS1
l C UO WhObW] ]T e] aV] a W bVS TWS Ra ]T [cbcO W bS Sab P]bV W bVS I G O R
>O O
l 9fQVO US ]T I7G8ia TOQc bg O R ]ab R]Qb] ObSa O R 5=GHia SaSO QVS a
l I7G8 abcRS ba O bWQW ObS W SaSO QV ]XSQba Ob 5=GH W >O O
l 7 XXMN M Ua U R M O V O N b H7f DED M P 567
l H7F8 M P 5 FG M OT 567 R O XXMN M Ua V O
),
26. 567 &DED3 M P 7TMXX S D V O
m ;S O R DO] GO = abWbcbS2 ;S] US GcUWVO O G=C
m 7 SObW] ]T Sc ][] VWQ RSS SO W U
O QVWbSQbc Sa Pg O US aQO S Rg O[WQ []RS W U ]T
b O a O S b TWaV P OW a ] 567=
)-
ABCI is ideally suited to perform the convergent cross
mapping (CCM) to interrogate the relationships
between the ~120,000 neurons of the larval zebrafish
brain during exposure to various stimuli
From Pao, 2018. Green are neurons, red are active
neurons responding to hypoxic environment.Check Wassapon’s Poster (PP25)!
27. 27
567 M P 7TMXX S
m k567= ; O R 7VO S USn ]U O[ S Q]c OUSa SaSO QVS a O R
abcRS ba b] SdS OUS 567=ia S ] []ca Q][ cbS QO OPW Wbg
( // ]RSa ,) ;DIa T] c b] ) V]c a b] ac ] b
VWUV g QVO S UW U bVS[Sa W bVS TWS R ]T 5=
n Important Date
5 XUOM U
8 MPXU
B URUOM U M P 7TMXX S J W
5 W ) (/ AOg ) (/ Oab eSS W >c ) (/
5cUcab ( ) (/ GS bS[PS ) (/ Oab eSS W CQb ) (/
, B]dS[PS ) (/ 8SQS[PS )) ) (/ Oab eSS W >O ) (0
28. 28
GTM W d R d M U
A R YM U U MaMUXMNX
T 3((MNOU MU(