3. OpenPOWER & AI Workshop at BSC ,Barcelona
By OpenPOWER Academia
Day 1 is meant as an introduction for everyone interested in using AI.
Day 2 is meant to go deeper with those who have especially challenging projects.
on 18th and 19th June 2018
4. Agenda
Day 1 - June 18th 2018
9:00 a.m to 9.30 a.m.
9.30 a.m to 10.15 am
10.15 am to 10.30 am
10.30 am to 11.15 am
11.15 am to 12.00 Noon
12.00 Noon to 1.00 pm
Welcome and OpenPOWER ADG features
Introduction to Power 9 and PowerAI
Break
Large Model Support and Distributed Deep Learning
Use Case Demonstration with PowerAI
Lunch
1.00pm to 1.45 pm
1.45 pm to 2.45 pm
2.45 pm to 3.00 pm
3.00.pm to 3.45pm
3.45 pm to 4.45 pm
4.45 pm to 5.00 pm
Mellanox Feature Updates
CFD Simulation on Power
Break
Introduction to Snap Machine Learning
Snap Machine Learning Demos , Q&A
Wrap up and Q & A
5. Agenda
Day 2 - June 19th 2018
9.00 am to 9.30 am
9.30 am to 12.00 pm
12.00 pm to 1.00 pm
01.00 pm to 04.30 pm
Quick review about Day I
Deep Learning Exercise II using Nimbix /Other Infra
Industry specific use cases ( LMS )
Lunch
Deep Learning Exercise II using Nimbix/Other infra
Industry specific Use cases using P9 features ( LMS
and DDL )
13. What is CORAL?
The program through which Summit & Sierra are procured.
Several DOE labs have strong supercomputing programs and facilities.
To bring the next generation of leading supercomputers to these labs, DOE
created CORAL (the Collaboration of Oak Ridge, Argonne, and Livermore) to
jointly procure these systems, and in so doing, align strategy and resources
across the DOE enterprise.
Collaboration grouping of DOE labs was done based on common acquisition
timings. Collaboration is a win-win for all parties.
“Summit” System “Sierra” System
OpenPOWER Technologies: IBM POWER CPUs, NVIDIA Tesla GPUs, Mellanox EDR
100Gb/s InfiniBand
Paving The Road to Exascale Performance
14. Academic Membership
Currently about 100+ academic members in OPF
14
A*STAR ASU ASTRI Moscow State
University
Carnegie Mellon Univ.
CDAC Colorado School of
Mines
CINECA CFMS Coimbatore Institute of
Technology
Dalian University of
Technology
GSIC Hartree Centre ICM IIIT Bangalore
IIT Bombay Indian Institute for
Technology Roorkee
ICCS INAF FZ Jülich
LSU BSC Nanyang
Technological
University
National University of
Singapore
NIT Mangalore
NIT Warangal Northeastern
University in China
ORNL OSU RICE
Rome HPC Center LLNL SANDIA SASTRA University Seoul National
University
Shanghai Shao Tong
University
SICSR TEES Tohoku University Tsinghua University
University of Arkansas SDSC Unicamp University of Central
Florida
University of Florida
University of Hawai University of
Hyderabad
University of Illinois University of Michigan University of Oregon
University of Patras University of Southern
California
TACC Waseda University IISc ,Loyola,IIT
Roorkee
15. Goals of the Academia Discussion Group
Provide training and exchange of experience and know-how
Provide platform for networking among academic members
Work on engagement of HPC community
Enable co-design/development activities
15
6/2
0/2
16. Conclusions
Growing number of academic organizations have become member of the
OpenPOWER Foundation
The Academia Discussion Groups provides a platform for training,
networking, engagement and enablement of co-design
Those who have not yet joined:
You are welcome to join
https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index
OpenPOWER AI virtual University's focus on bringing together industry,
government and academic expertise to connect and help shape the AI
future .
https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg
16
6/2
0/2
19. 1. CPU
- POWER9 NZ gzip, has a potential when working with compressed-full
workload to reduced memory foot print and I/O bottlenecks in pre-processing
stage; is not today available but hopefully we will get this soon;
- CPU has direct access to GPU memory without need for migration; not
explored today in TF or Caffe part of PowerAI
- VSX3 can accelerate the media processing/pre-processing for computer
vision
http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/altivec.pdf
2. System’s Memory
- 8x DDR4 memory channels will always give more performance and prevent
memory contention in AI workloads
- Managed memory is cache-coherent between CPU & GPU; not explored
today in TF or Caffe part of PowerAI
20. 3. GPU
- NVLINK 2.0 with the CPU allows faster data movement from the CPU to the
GPU when datasets are larger in range of TB's
- GPUDirect RDMA to unified memory; don't think is explored today in TF or
Caffe part of PowerAI
- technology such LMS are best feet for large models like deep residual
networks / ResNet-152
https://arxiv.org/pdf/1803.06333
4. InfiniBand
- MPI / DDL / Horovod have the potential to explore this unique multi-host
socket direct adapter and provide lowest possible latency between many
learners when training. This will lead to lower training times. Posible
improvements in training efficiency over exiting research paper:
https://arxiv.org/pdf/1708.02188
21. 5. I/O:
- PCIe Gen4 offers for NVMe adapters more bandwidth used for caching
datasets into compute nodes more closer to the GPUs (13.5GB/s vs 6.8GB/s
in PCIe Gen3); this is helping very much in pre-fetching the data into the
system memory
- OpenCAPI provides more bandwidth for other type of accelerators such
FPGA's give then option of fast inference processes; possible other kinds of
DRAM in the feature.
6. Others:
- Water cooled systems available for 4x GPUs and 6x GPUs are making the
AI solutions much more efficient at scale taken into consideration 300W/GPU
power consumption.