SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
/
C
2 7 1 :
1
8 0
2
. , ,
AI
3
O c x
Z / 6
o Z
t k N x v Z
N G E N G E /
v Z dfl
/KIJH N )6
tk
e fae O
/BEKM 6
E BEB
kn O
6 R O
fn Oe
8 R
X g
OiO
d N/BEKMb lp
JA HE J
kn O
O
rOo
fn Oe
M
v Z dfl
+ )6
G K v O O
6G HC + FFG
O c x
FIJ H I /
Z
/DB
AFKJ
v
Z
H GA8
BH GA
L N 6 D R
6 /
e
+BL B
DFK F6 /
+ I II E H FE F
bO Od e
FF G H
/BEKM 6
oP Oi Q fub P+Xx aOd
dfl hvn X
sOo X
6
)FHJH EN N R
• IW b
• P e
• IW b G C
U bG /
• P G b G C
b H A b
G /
• IW U P
• P U
• IW
• P H
• IW P G
• P P G
U hvn X bdfl
W
U o fub U S
AI Bridging Cloud Infrastructure
as World’s First Large-scale Open AI Infrastructure
• Open, Public, and Dedicated infrastructure for AI/Big Data
• Platform to accelerate joint academic-industry R&D for AI in Japan
• Top-level compute capability w/ 0.550EFlops(AI), 37.2 PFlops(DP)
( )2 5 0.1 ,. 10 5 ## 0 #
4
Univ. Tokyo Kashiwa II Campus
Operation Scheduled in 2018
• 1088x compute nodes w/ 4352x NVIDIA Tesla V100 GPUs, 43520 CPU Cores,
476TiB of Memory, 1.6PB of NVMe SSDs, 22PB of HDD-based Storage and
Infiniband EDR network
• Ultra-dense IDC design from the ground-up w/ 20x thermal density of standard IDC
• Extreme Green w/ ambient warm liquid cooling, high-efficiency power supplies, etc.,
commoditizing supercomputer cooling technologies to clouds ( 2.3MW, 70kW/rack)
5
Gateway and
Firewall
Computing Nodes: 0.550 EFlops(HP), 37 PFlops(DP)
476 TiB Mem, 1.6 PB NVMe SSD
Storage: 22 PB GPFS
High Performance Computing Nodes (w/GPU) x1088
• Intel Xeon Gold6148 (2.4GHz/20cores) x2
• NVIDIA Tesla V100 (SXM2) x 4
• 384GiB Memory, 1.6TB NVMe SSD
Multi-platform Nodes (w/o GPU) x10
• Intel Xeon Gold6132 (2.6GHz/14cores) x2
• 768GiB Memory, 3.8TB NVMe SSD
Interactive Nodes
DDN SFA14K
(w/ SS8462 Enclosure x 10) x 3
• 12TB 7.2Krpm NL-SAS HDD x 2400
• 3.84TB SAS SSD x 216
• NSD Servers x 12
Object Storage for Protocol Nodes
100GbE
Service Network (10GbE)
External
Networks
SINET5
Interconnect (Infiniband EDR)
ABCI: AI Bridging Cloud Infrastructure
6
System
(32 Racks)Rack
(17 Chassis)
Compute Node
(4GPUs, 2CPUs)
Chips
(GPU, CPU)
Node Chassis
(2 Compute Nodes)
NVIDIA Tesla V100
(16GB SMX2)
3.72 TB/s MEM BW
384 GiB MEM
200 Gbps NW BW
1.6TB NVMe SSD
1.16 PFlops(DP)
17.2 PFlops (AI)
37.2 PFlops(DP)
0.550 EFlops (AI)
68.5 PFlops(DP)
1.01 PFlops (AI)
34.2 TFlops(DP)
506 TFlops (AI)
GPU:
7.8 TFlops(DP)
125 TFlops (AI)
CPU:
1.53 TFlops(DP)
3.07 TFlops (AI)
Intel Xeon Gold 6148
(27.5M Cache,
2.40 GHz, 20 Core)
0.550 EFlops(AI), 37.2 PFlops(DP)
19.88 PFlops(Peak), Ranked #5 Top500 June 2018
131TB/s MEM BW
Full Bisection BW within Rack
70kW Max
1088 Compute Nodes
4352 GPUs
4.19 PB/s MEM BW
1/3 of Oversubscription BW
2.3MW
GPU Compute Nodes
• NVIDIA TESLA V100
(16GB, SXM2) x 4
• Intel Xeon Gold 6148
x 2 Sockets
– 20 cores per Socket
• 384GiB of DDR4 Memory
• 1.6TB NVMe SSD x 1
– Intel DC P4600 u.2
• EDR Infiniband HCA x 2
– Connected to other Compute Notes
and Filesystems
7
Xeon Gold
6148
Xeon Gold
6148
10.4GT/s x3DDR4-2666
32GB x 6
DDR4-2666
32GB x 6
128GB/s 128GB/s
IB HCA (100Gbps)IB HCA (100Gbps)
NVMe
UPI x3
x48 switch x64 switch
Tesla V100 SXM2 Tesla V100 SXM2
Tesla V100 SXM2 Tesla V100 SXM2
PCIe gen3 x16 PCIe gen3 x16
PCIe gen3 x16 PCIe gen3 x16
NVLink2 x2
Rack as Dense-packaged “Pod”
( AB < ) 1 6
) 0 BC / 0 BC ,3
0G < F BA -7
BH<D G D CF BA -7 FB
<IF<DA CB
) 7 7 4 I
7 F<D , D 2 D .BB A
8Pod #1
LEAF#1
(SB7890)
LEAF#2
(SB7890)
LEAF#3
(SB7890)
LEAF#4
(SB7890)
SPINE#1
(CS7500)
SPINE#2
(CS7500)
CX40
0#1
CX2570#1
CX2570#2
CX40
0#2
CX2570#3
CX2570#4
CX40
0#3
CX2570#5
CX2570#6
CX40
0#17
CX2570#33
CX2570#34
FBB#1
(SB7890)
FBB#2
(SB7890)
FBB#3
(SB7890)
1/3 Oversubscription BW
IB-EDR x 24
Full bisection BW
IB-EDR x 72
InfiniBand EDR x1
InfiniBand EDR x6
InfiniBand EDR x4
x 32 pods
Hierarchical Storage Tiers
• Local Storage
– 1.6 TB NVMe SSD (Intel DC P4600 u.2) per Node
– Local Storage Aggregation w/ BeeOnd
• Parallel Filesystem
– 22PB of GPFS
• DDN SFA14K ( w/ SS8462 Enclosure x 10) x 3 set
• Bare Metal NSD servers and Flash-based Metadata
Volumes for metadata operation acceleration
– Home and Shared Use
• Object Storage
– Part of GPFS using OpenStack Swift
– S3-like API Access, Global Shared Use
– Additional Secure Volumes w/ Encryption
(Planned)
9
Parallel Filesystem
Local Storage
as Burst Buffers
Object Storage as Campaign Storage
Performance Reference for Distributed Deep Learning
10
Better• Environments
– ABCI 64 nodes (256 GPUs)
– Framework: ChainerMN v1.3.0
• Chainer 4.2.0, Cupy 4.2.3, mpi4py 3.0.0, Python 3.6.5
– Baremetal
• CentOS 7.4, gcc-4.8.5,
CUDA 9.2, CuDNN 7.1.4, NCCL2.2, OpenMPI 2,1.3
• Settings
– Dataset: Imagenet-1K
– Model: ResNet-50
– Training:
• Batch size: 32 per GPU, 32 x 256 in total
• Learning Rate: Starting 0.1 and x0.1 at 30, 60, 80 epoch
w/ warm up scheduling
• Optimization: Momentum SGD (momentum=0.9)
• Weight Decay: 0.0001
• Training Epoch: 100
LGD M
P
C N D M
A D U
N , L
A D
I M C
11
/home (GPFS)
Job Job Job Job
NQS
Submit
Scheduling
script
file
$ qsub <option> script_filename
inter-connect
SSH
( )
G
(High Throughput Computing)
/: /
D 0 1 72 A :
/: /
D / 2
D 172 2
.2: , 2
,C : ,/17/ 2 2 1
inkbc
augk t v y
12
, Hgmeu P
• 172 fp k
Hh rs wogk L Q
• lr t augk t T
.
,
13
(cont’d)
CUDA8.0
8.0.44 8.0.61.2
CUDA9.0
9.0.176
CUDA9.1
9.1.85 9.1.85.1 9.1.85.3
CuDNN5.1
5.1.5 5.1.10
CuDNN6.0
6.0.21
CuDNN7.0
7.0.5
CuDNN7.1
7.1.1 7.1.2 7.1.3
CUDA9.2
9.2.88.1
NCCL1.3
1.3.4 1.3.40-1
NCCL2.1NCCL2.0 NCCL2.2
2.0.5-3 2.1.4-1 2.1.15-1 2.2.12
OpenMPI MVAPICH2-GDR2.1.3 3.0.1 3.1.0 2.3a
Python 2.7 3.5 3.6
Python Modules mpi4py matplotlibCython Pillow Jupyter
Caffe2 CNTK ChainerMN Tensorflow MXNetNnabla
Software Stuck for ABCI
• Batch Job Scheduler
– High throughput computing
• Minimum Pre-installed Software
– Users can deploy their environments
using anaconda, pip, python venv, etc.
– Reduce operational cost
• Container Support
– Singularity for multi-node jobs w/
user customized images
– Docker for single-node jobs w/
site-certified images
14
User Applications
DL
Frameworks
Python, Ruby, R, Java, Scala, Perl, Lua, etc.
ISV AppsOSSHadoop/
Spark
GCC PGI
OpenMPI MVAPICH2
Intel Parallel
Studio XE
Cluster Edition
CUDA/CUDNN/NCCL
GPFS BeeOND
OpenStack
Swift
Univa Grid Engine
Singularity Docker
CentOS/RHEL
ABCI : Dynamic Container Deployment
with HPC Linux Containers
Linux Container (Singulairty, Docker)
GPFS/Object Storage
Compute
Node
Compute
Node
Compute
Node
Compute
Node
Container
Image
Container
Image
Container
Image
Container
Image
Job Job Job Job
Job Scheduler
Container
Image
Register/copy container images
Import/copy container images
Submit jobs with container images
Container image repository
(Dockerhub, private registry)
P D
. ,
H O I
CDHR D HAU C
M P
16
CharlieCloud
HPCEnterprise
(
(
D u Mvtr B pae
C . :
. /: CA C
m p
D l Mgo mi ws cn PS
:: M b kpI
M :: HL hpd ws
D :/ I y
:/ . x
D fcU S I y
, . /
17
( )
sudo singularity build –sandbox tmpdir/ Singularity
sudo singularity build –writable container.img Singularity
sudo singularity build container.img Singularity
sudo singularity build container.img docker://ubuntu
sudo singularity build container.img shub://ubuntu
) S
R
sudo singularity shell –writable container.img
D H
R
(, R
( (
, , , ,
container.img
h
S g
singularity run container.img
singularity exec container.img …
singularity shell container.img
a
Sc
) (
e e
:
/. # $ / . $
-$
-$ . $
S
19
4 , C C GHI
4 4 ,4 M :
c s a mn_e a e e s
P B . . , kt
M :
c sP B Md v
lN g I
Bc s i ro . i NI up
4. 4 . - . , , . , . - 4. 4 -
M :
lN g
l N 4. 4 g
20
- :
- D
0- 8 D H 0-
6/ .8 / 8/ / - / .
0- / / 8 $ 8 - -
6/ .8 / -:- / - / . / / 6 /
21
GPU --nv
C B 3A3 : 3
0 :
M
C $ : 3 an f uV
- a gi pN
C $ c v - fo : 3
r P H ts I
y m i NN _ O
3 - B leo
. - m i x
22
S y x
R 40 HIA ) 29
R m in - 03 H K67 O
S 0D H K 0N G 9 MDIH )
R oad
S HCN K M O
S NHMN ) C ( C
R g l
S 0 HM8 C ( C
S yup
R b c- 4G C H M 5
R bl- : 7 M (
R -
S e W - 29 U P )
S s - I D U
S t r- 6IG HMNG 21 GIG HMNG. ,
S = CDM 1 -
S h v-
23
:
Better:
L i
C 2 2 , 2 e D
N GLbdP G GU
M iA
C 2 2 , 2 e
2 - ak
M i
P GU
G L
, ,
o n I
G L
c P
24
Base Drivers, Libraries on Host
CUDA
Drivers
Infiniband
Drivers
Filesystem
Libraries
(GPFS,
Lustre)
Userland Libraries on Container
CUDA CuDNN NCCL2
MPI
(mpi4py)
Mount
ibverbs
Distributed Deep Learning
Frameworks
Caffe2 ChainerMNDistributed
TensorflowMXNet
25
/ . ) /
m S C
O = y t = i
sn f
y Sun v
S C ,
=. / St i S C
. / • C
I
/. , ) , ) , / ) O >– iU >
ir ga
lu bc S S
ke o a
26
A P oce ʼ A
Oʼ A L ʼ t
uO P
V P d A P
S r ʼ
oceA –
uO V V R
I • O x O o n M A
() P ao
oceO P A
– i dP• oce O od L
i dO () o d n oce O od
27
ei S Ll B
sI
B BCA I g e
/ R
/
ro Ra
ei A L n ut
28
29

Mais conteúdo relacionado

Mais procurados

Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructuretempledf
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_PlaceKohei KaiGai
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and FutureMasahiko Sawada
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)Kohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
Vacuum more efficient than ever
Vacuum more efficient than everVacuum more efficient than ever
Vacuum more efficient than everMasahiko Sawada
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 

Mais procurados (20)

Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Vacuum more efficient than ever
Vacuum more efficient than everVacuum more efficient than ever
Vacuum more efficient than ever
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 

Semelhante a AI Bridging Cloud Infrastructure as World's First Large-scale Open AI Infrastructure

Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorDelivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorRebekah Rodriguez
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionMattBethel1
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 

Semelhante a AI Bridging Cloud Infrastructure as World's First Large-scale Open AI Infrastructure (20)

Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorDelivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data Production
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 

Mais de Hitoshi Sato

Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習Hitoshi Sato
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会Hitoshi Sato
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMNHitoshi Sato
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus WorkshopHitoshi Sato
 

Mais de Hitoshi Sato (6)

Singularityで分散深層学習
Singularityで分散深層学習Singularityで分散深層学習
Singularityで分散深層学習
 
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
第162回情報処理学会ハイパフォーマンスコンピューティング研究発表会
 
GTC Japan 2017
GTC Japan 2017GTC Japan 2017
GTC Japan 2017
 
産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN産総研AIクラウドでChainerMN
産総研AIクラウドでChainerMN
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus Workshop
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 

Último

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Último (20)

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 

AI Bridging Cloud Infrastructure as World's First Large-scale Open AI Infrastructure

  • 1. / C 2 7 1 : 1 8 0
  • 3. 3 O c x Z / 6 o Z t k N x v Z N G E N G E / v Z dfl /KIJH N )6 tk e fae O /BEKM 6 E BEB kn O 6 R O fn Oe 8 R X g OiO d N/BEKMb lp JA HE J kn O O rOo fn Oe M v Z dfl + )6 G K v O O 6G HC + FFG O c x FIJ H I / Z /DB AFKJ v Z H GA8 BH GA L N 6 D R 6 / e +BL B DFK F6 / + I II E H FE F bO Od e FF G H /BEKM 6 oP Oi Q fub P+Xx aOd dfl hvn X sOo X 6 )FHJH EN N R • IW b • P e • IW b G C U bG / • P G b G C b H A b G / • IW U P • P U • IW • P H • IW P G • P P G U hvn X bdfl W U o fub U S
  • 4. AI Bridging Cloud Infrastructure as World’s First Large-scale Open AI Infrastructure • Open, Public, and Dedicated infrastructure for AI/Big Data • Platform to accelerate joint academic-industry R&D for AI in Japan • Top-level compute capability w/ 0.550EFlops(AI), 37.2 PFlops(DP) ( )2 5 0.1 ,. 10 5 ## 0 # 4 Univ. Tokyo Kashiwa II Campus Operation Scheduled in 2018
  • 5. • 1088x compute nodes w/ 4352x NVIDIA Tesla V100 GPUs, 43520 CPU Cores, 476TiB of Memory, 1.6PB of NVMe SSDs, 22PB of HDD-based Storage and Infiniband EDR network • Ultra-dense IDC design from the ground-up w/ 20x thermal density of standard IDC • Extreme Green w/ ambient warm liquid cooling, high-efficiency power supplies, etc., commoditizing supercomputer cooling technologies to clouds ( 2.3MW, 70kW/rack) 5 Gateway and Firewall Computing Nodes: 0.550 EFlops(HP), 37 PFlops(DP) 476 TiB Mem, 1.6 PB NVMe SSD Storage: 22 PB GPFS High Performance Computing Nodes (w/GPU) x1088 • Intel Xeon Gold6148 (2.4GHz/20cores) x2 • NVIDIA Tesla V100 (SXM2) x 4 • 384GiB Memory, 1.6TB NVMe SSD Multi-platform Nodes (w/o GPU) x10 • Intel Xeon Gold6132 (2.6GHz/14cores) x2 • 768GiB Memory, 3.8TB NVMe SSD Interactive Nodes DDN SFA14K (w/ SS8462 Enclosure x 10) x 3 • 12TB 7.2Krpm NL-SAS HDD x 2400 • 3.84TB SAS SSD x 216 • NSD Servers x 12 Object Storage for Protocol Nodes 100GbE Service Network (10GbE) External Networks SINET5 Interconnect (Infiniband EDR)
  • 6. ABCI: AI Bridging Cloud Infrastructure 6 System (32 Racks)Rack (17 Chassis) Compute Node (4GPUs, 2CPUs) Chips (GPU, CPU) Node Chassis (2 Compute Nodes) NVIDIA Tesla V100 (16GB SMX2) 3.72 TB/s MEM BW 384 GiB MEM 200 Gbps NW BW 1.6TB NVMe SSD 1.16 PFlops(DP) 17.2 PFlops (AI) 37.2 PFlops(DP) 0.550 EFlops (AI) 68.5 PFlops(DP) 1.01 PFlops (AI) 34.2 TFlops(DP) 506 TFlops (AI) GPU: 7.8 TFlops(DP) 125 TFlops (AI) CPU: 1.53 TFlops(DP) 3.07 TFlops (AI) Intel Xeon Gold 6148 (27.5M Cache, 2.40 GHz, 20 Core) 0.550 EFlops(AI), 37.2 PFlops(DP) 19.88 PFlops(Peak), Ranked #5 Top500 June 2018 131TB/s MEM BW Full Bisection BW within Rack 70kW Max 1088 Compute Nodes 4352 GPUs 4.19 PB/s MEM BW 1/3 of Oversubscription BW 2.3MW
  • 7. GPU Compute Nodes • NVIDIA TESLA V100 (16GB, SXM2) x 4 • Intel Xeon Gold 6148 x 2 Sockets – 20 cores per Socket • 384GiB of DDR4 Memory • 1.6TB NVMe SSD x 1 – Intel DC P4600 u.2 • EDR Infiniband HCA x 2 – Connected to other Compute Notes and Filesystems 7 Xeon Gold 6148 Xeon Gold 6148 10.4GT/s x3DDR4-2666 32GB x 6 DDR4-2666 32GB x 6 128GB/s 128GB/s IB HCA (100Gbps)IB HCA (100Gbps) NVMe UPI x3 x48 switch x64 switch Tesla V100 SXM2 Tesla V100 SXM2 Tesla V100 SXM2 Tesla V100 SXM2 PCIe gen3 x16 PCIe gen3 x16 PCIe gen3 x16 PCIe gen3 x16 NVLink2 x2
  • 8. Rack as Dense-packaged “Pod” ( AB < ) 1 6 ) 0 BC / 0 BC ,3 0G < F BA -7 BH<D G D CF BA -7 FB <IF<DA CB ) 7 7 4 I 7 F<D , D 2 D .BB A 8Pod #1 LEAF#1 (SB7890) LEAF#2 (SB7890) LEAF#3 (SB7890) LEAF#4 (SB7890) SPINE#1 (CS7500) SPINE#2 (CS7500) CX40 0#1 CX2570#1 CX2570#2 CX40 0#2 CX2570#3 CX2570#4 CX40 0#3 CX2570#5 CX2570#6 CX40 0#17 CX2570#33 CX2570#34 FBB#1 (SB7890) FBB#2 (SB7890) FBB#3 (SB7890) 1/3 Oversubscription BW IB-EDR x 24 Full bisection BW IB-EDR x 72 InfiniBand EDR x1 InfiniBand EDR x6 InfiniBand EDR x4 x 32 pods
  • 9. Hierarchical Storage Tiers • Local Storage – 1.6 TB NVMe SSD (Intel DC P4600 u.2) per Node – Local Storage Aggregation w/ BeeOnd • Parallel Filesystem – 22PB of GPFS • DDN SFA14K ( w/ SS8462 Enclosure x 10) x 3 set • Bare Metal NSD servers and Flash-based Metadata Volumes for metadata operation acceleration – Home and Shared Use • Object Storage – Part of GPFS using OpenStack Swift – S3-like API Access, Global Shared Use – Additional Secure Volumes w/ Encryption (Planned) 9 Parallel Filesystem Local Storage as Burst Buffers Object Storage as Campaign Storage
  • 10. Performance Reference for Distributed Deep Learning 10 Better• Environments – ABCI 64 nodes (256 GPUs) – Framework: ChainerMN v1.3.0 • Chainer 4.2.0, Cupy 4.2.3, mpi4py 3.0.0, Python 3.6.5 – Baremetal • CentOS 7.4, gcc-4.8.5, CUDA 9.2, CuDNN 7.1.4, NCCL2.2, OpenMPI 2,1.3 • Settings – Dataset: Imagenet-1K – Model: ResNet-50 – Training: • Batch size: 32 per GPU, 32 x 256 in total • Learning Rate: Starting 0.1 and x0.1 at 30, 60, 80 epoch w/ warm up scheduling • Optimization: Momentum SGD (momentum=0.9) • Weight Decay: 0.0001 • Training Epoch: 100
  • 11. LGD M P C N D M A D U N , L A D I M C 11 /home (GPFS) Job Job Job Job NQS Submit Scheduling script file $ qsub <option> script_filename inter-connect SSH ( ) G (High Throughput Computing)
  • 12. /: / D 0 1 72 A : /: / D / 2 D 172 2 .2: , 2 ,C : ,/17/ 2 2 1 inkbc augk t v y 12 , Hgmeu P • 172 fp k Hh rs wogk L Q • lr t augk t T
  • 13. . , 13 (cont’d) CUDA8.0 8.0.44 8.0.61.2 CUDA9.0 9.0.176 CUDA9.1 9.1.85 9.1.85.1 9.1.85.3 CuDNN5.1 5.1.5 5.1.10 CuDNN6.0 6.0.21 CuDNN7.0 7.0.5 CuDNN7.1 7.1.1 7.1.2 7.1.3 CUDA9.2 9.2.88.1 NCCL1.3 1.3.4 1.3.40-1 NCCL2.1NCCL2.0 NCCL2.2 2.0.5-3 2.1.4-1 2.1.15-1 2.2.12 OpenMPI MVAPICH2-GDR2.1.3 3.0.1 3.1.0 2.3a Python 2.7 3.5 3.6 Python Modules mpi4py matplotlibCython Pillow Jupyter Caffe2 CNTK ChainerMN Tensorflow MXNetNnabla
  • 14. Software Stuck for ABCI • Batch Job Scheduler – High throughput computing • Minimum Pre-installed Software – Users can deploy their environments using anaconda, pip, python venv, etc. – Reduce operational cost • Container Support – Singularity for multi-node jobs w/ user customized images – Docker for single-node jobs w/ site-certified images 14 User Applications DL Frameworks Python, Ruby, R, Java, Scala, Perl, Lua, etc. ISV AppsOSSHadoop/ Spark GCC PGI OpenMPI MVAPICH2 Intel Parallel Studio XE Cluster Edition CUDA/CUDNN/NCCL GPFS BeeOND OpenStack Swift Univa Grid Engine Singularity Docker CentOS/RHEL
  • 15. ABCI : Dynamic Container Deployment with HPC Linux Containers Linux Container (Singulairty, Docker) GPFS/Object Storage Compute Node Compute Node Compute Node Compute Node Container Image Container Image Container Image Container Image Job Job Job Job Job Scheduler Container Image Register/copy container images Import/copy container images Submit jobs with container images Container image repository (Dockerhub, private registry)
  • 16. P D . , H O I CDHR D HAU C M P 16 CharlieCloud HPCEnterprise
  • 17. ( ( D u Mvtr B pae C . : . /: CA C m p D l Mgo mi ws cn PS :: M b kpI M :: HL hpd ws D :/ I y :/ . x D fcU S I y , . / 17
  • 18. ( ) sudo singularity build –sandbox tmpdir/ Singularity sudo singularity build –writable container.img Singularity sudo singularity build container.img Singularity sudo singularity build container.img docker://ubuntu sudo singularity build container.img shub://ubuntu ) S R sudo singularity shell –writable container.img D H R (, R ( ( , , , , container.img h S g singularity run container.img singularity exec container.img … singularity shell container.img a Sc ) ( e e
  • 19. : /. # $ / . $ -$ -$ . $ S 19
  • 20. 4 , C C GHI 4 4 ,4 M : c s a mn_e a e e s P B . . , kt M : c sP B Md v lN g I Bc s i ro . i NI up 4. 4 . - . , , . , . - 4. 4 - M : lN g l N 4. 4 g 20
  • 21. - : - D 0- 8 D H 0- 6/ .8 / 8/ / - / . 0- / / 8 $ 8 - - 6/ .8 / -:- / - / . / / 6 / 21 GPU --nv
  • 22. C B 3A3 : 3 0 : M C $ : 3 an f uV - a gi pN C $ c v - fo : 3 r P H ts I y m i NN _ O 3 - B leo . - m i x 22
  • 23. S y x R 40 HIA ) 29 R m in - 03 H K67 O S 0D H K 0N G 9 MDIH ) R oad S HCN K M O S NHMN ) C ( C R g l S 0 HM8 C ( C S yup R b c- 4G C H M 5 R bl- : 7 M ( R - S e W - 29 U P ) S s - I D U S t r- 6IG HMNG 21 GIG HMNG. , S = CDM 1 - S h v- 23 : Better:
  • 24. L i C 2 2 , 2 e D N GLbdP G GU M iA C 2 2 , 2 e 2 - ak M i P GU G L , , o n I G L c P 24 Base Drivers, Libraries on Host CUDA Drivers Infiniband Drivers Filesystem Libraries (GPFS, Lustre) Userland Libraries on Container CUDA CuDNN NCCL2 MPI (mpi4py) Mount ibverbs Distributed Deep Learning Frameworks Caffe2 ChainerMNDistributed TensorflowMXNet
  • 25. 25
  • 26. / . ) / m S C O = y t = i sn f y Sun v S C , =. / St i S C . / • C I /. , ) , ) , / ) O >– iU > ir ga lu bc S S ke o a 26
  • 27. A P oce ʼ A Oʼ A L ʼ t uO P V P d A P S r ʼ oceA – uO V V R I • O x O o n M A () P ao oceO P A – i dP• oce O od L i dO () o d n oce O od 27
  • 28. ei S Ll B sI B BCA I g e / R / ro Ra ei A L n ut 28
  • 29. 29