SlideShare a Scribd company logo
1 of 27
Download to read offline
HPC Transformation with AI
COGNITIVE SYSTEMS
Ing. Florin Manaila
Senior Architect and Inventor
Cognitive Systems (Distributed Deep Learning and HPC)
IBM Systems Hardware Europe
Member of the IBM Academy of Technology (AoT)
March 24, 2020
Technical R&D today disruption
2Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Knowledge Discovery Pipeline
3Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Infrastructure
Demands for AI
Equipped for volumes of data
Flexible storage for a
range of data demands
Versatile, power-efficient
data center accelerators
Advanced I/O for
minimal latency
Scalability and distributed
data center capability
Inference
Powerful data center
accelerators with coherence
Advanced I/O for high
bandwidth and low latency
Proven scalability
Training
Equipped for volumes of data
*** IBM and Business Partner Internal Use Only ***
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Next-Generation
Infrastructure
Stack
5Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
6
AI Workflow
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Distributed Deep Learning
Common options
7
SINGLE ACCELERATOR DATA PARALLEL MODEL PARALLEL DATA AND MODEL PARALLEL
1x Accelerator 4x Accelerators 4x Accelerators
4x n Accelerators
Longer Training Time Shorter Training Time
System1System2Systemn
System
Data
Data
DataDataDataData
DataDataData
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Node 0
Data-Parallel Framework
Distributed Learning
Partition 0
GPU 0
GPU 1
GPU 2
GPU 3
Partition (0,0)
Partition (0,1)
Partition (0,2)
Partition (0,3)
Node 1
Partition 1
GPU 0
GPU 1
GPU 2
GPU 3
Partition (1,0)
Partition (1,1)
Partition (1,2)
Partition (1,3)
8
Large Dataset
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Rearchitecting
the hardware
for AI
9Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
10
Experimentation Scaling Production
Architecture for large IBM HPC Cluster
Hardware overview for bare-metal / K8s
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Architecture for large IBM HPC/AI Cluster
Hardware overview for bare-metal / K8s
11Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Why Capable
and High BW Interconnects?
§ Each part of an application runs on the best
compute location
o But there are performance and
programmability challenges
o Desire a highly-capable interconnect between
PEs
§ Low-latency communication and high data
bandwidth
§ Fine-grained + bulk data transfers
§ Consistent, unified view of memory
§ Hardware cache coherence & atomic operations
PE Type A
(e.g. CPU)
PE Type B
(e.g. GPU)
Large, low-latency
Memory
Small, High-
bandwidth
Memory
Heterogeneous systems are attractive for efficient performance
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
13
Enterprise AI Hardware Portfolio Expansion
IBM Power AC922
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power IC922
INFERENCE
IBM Power IC922
Deploy AI into ProductionStorage Dense Server
§ Enterprise ready cloud deployment
with RH OpenShift and Power
Systems reliability
§ Superior I/O for data movement:
PCIe Gen 4
§ Superior price/performance
§ Best training platform with 4x
faster model iteration
§ ~6x data throughput with NVLink
to GPUs
§ Synergistic HW/SW offerings for
ease of use and leadership
performance
§ NVIDIA V100 SMX2 GPUs
§ Superior density and through-put to
inference accelerators
§ Open design for accelerator
flexibility
§ Deploy inference at scale with SW
capabilities leveraging superior IO
§ NVIDIA T4 GPUs
§ Upcoming: FPGAs and ASICs
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Inference server details
14
Form Factor
§ 19” Rack 2U Server
POWER9 Processor
§ 2 dd2.3x P9 Nimbus chips (LaGrange pkg)
§ TDP : 225W
§ 12 (160W), 16, 20 cores (SMT <= 4)
Memory
§ Direct Attach Memory
§ 32 DDR4 ISDIMM Slots @2400 MHz (double
drop)
§ 16 DDR4 ISDIMMs @2667 MHz (single drop)
§ 16, 32, 64 GB RDIMMs
§ 2 TB Max memory
§ 340 GB/s peak memory BW (with 16x DIMMs)
10 Integrated I/O Slots – Standard PCIe Riser
§ 2 PCIe G3 x16 FHFL Slots
(Supports double-wide accelerator)
§ 2 PCIe G4 x16 LP Slots
§ 2 PCIe G3 x8 FHFL Slots (physically x16)
§ 2 PCIe G3 x8 FHHL Slots
§ 2 PCIe G3 x16 LP Slots
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Inference server details
15
Internal Storage
§ Integrated Storage Controller = None
§ 24x 2.5” SAS/SATA
Native I/O
§ 2x USB 3.0 in rear
§ 2x 1G baseT (one shared mgmt) + 1x
1G dedicated IPMI
§ Serial port, VGA port
§ TPM2.0 via Nuvoton
NPCT650ABAWX included (for
Secure OS and trusted boot)
MTM (machine type – model)
• 9183-22X
Accelerators
Nvidia T4 Accelerator 16GB PCIe3 x16 LP
More to come
Networking
Mellanox MCX555A-ECAT 1-PORT EDR 100Gb IB CONNECTX-
5 GEN3 PCIe x16 CAPI CAPABLE
Mellanox MCX556A-ECAT 2-PORT EDR 100Gb IB CONNECTX-
5 GEN4 PCIe x16 CAPI CAPABLE
Mellanox MCX516A-CDAT 2-PORT 100Gb ROCE EN
CONNECTX-5 GEN4 PCIe x16
Mellanox MCX4121A-XCAT 2-PORT 10Gb NIC&ROCE SR/Cu
PCIe 3.0
Mellanox MCX4121A-ACAT 2-PORT 25/10Gb NIC&ROCE
SR/Cu PCIe 3.0
Marvell BCM957810A1008ICDM 2-PORT E'NET (2X10 10Gb),
PCIe Gen 2 X8
Marvell BCM957800A1006ICDM QUAD E'NET (2X1 + 2X10
10Gb), PCIe Gen 2 X8
Marvell BCM957800A1006ICDM QUAD E'NET (2X1 + 2X10
10Gb), PCIe Gen 2 X8
Broadcom BCM5719-4P 1Gb E'NET(UTP) 4-PORT ADPTR,
PCIE-x4
Fiber Channel
Broadcom LPe16002B-M6 2-PORT FIBER CHANNEL(16Gb/s),
PCIE3-8X
Broadcom LPe32002-M2 2-PORT FIBER CHANNEL(32Gb/s),
PCIE3-8X
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Inference server details
16
8x SAS/SATA 8x SAS/SATA 8x SAS/SATA
9183-22X
OS Support – LE
§ RHEL 7.6-alt
RAS
§ Concurrent Maintenance disks
§ Redundant Hot plug Power
§ Redundant Hot plug fans
§ Customer Install and Repair
§ Simplified Op Panel
§ In-rack system service
BMC Service Processor
§ Aspeed AST2500
§ OpenBMC
Certifications
§ FCC Class A
§ ASHRAE A2 Environment
(10-35C)
§ Acoustics Datacenter 1A
HDD Drives
HDD; 600GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; SED
HDD; 1200GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; SED
HDD; 2400GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; Non-SED
SSD Drives
SSD; 240GB; 2.5"; SATA; 6Gb/s; 1.4 DWPD; NonSED
SSD; 960GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED
SSD; 1920GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED
SSD; 3840GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED
CONTROLLERS
Broadcom (LSI) MegaRAID 9361-8i SAS3 Controller w/ 8
internal ports (2GB Cache) PCIe 3.0 x8 LP with cables
Broadcom 9300-8i PCIe gen3 x8 LP with cables
Broadcom 9305-16i PCIe gen3 x8 LP with cables
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Feature List:
§ REST Management
§ IPMI
§ SSH based SOL
§ Power and Cooling
Management
§ Event Logs
§ Zeroconf discoverable
§ Sensors
Features In
Progress:
§ Full IPMI 2.0
Compliance with DCMI
§ Verified Boot
§ HTML5 Java Script Web
User Interface
§ BMC RAS
IBM is the
OpenBMC
Community Leader
§ Facebook
§ Google
§ IBM
§ Intel
§ Microsoft
§ OCP
17
OpenBMC is a free open
source management
software Linux distribution
§ Inventory
§ LED Management
§ Host Watchdog
§ Simulation
§ Code Update Support for
multiple BMC/BIOS
images
§ POWER On Chip
Controller (OCC) SupportCognitive Systems Europe / March 24 / © 2020 IBM Corporation
Next-Generation
Software Stack
18Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
What’s in the training of deep neural networks?
Neural network model
Billions of parameters
Gigabytes
Computation
Iterative gradient based search
Millions of iterations
Mainly matrix operations
Data
Millions of images, sentences
Terabytes
Workload characteristics: Both compute and data intensive!
19Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
20
AI Infrastructure Stack
ON-CLOUD and ON-PREM
Transform & Prep
Data (ETL)
Micro-Services / Applications
Governance AI
(Fairness, Explainable AI,
Model Health, Accuracy)
APIs
(external and in-house)
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Segment Specific:
Finance, Retail, Healthcare,
Automotive
Speech, Vision,
NLP, Sentiment
TensorFlow, Caffe,
Pytorch
SparkML, Snap.ML
Spark, MPI
Hadoop HDFS,
NoSQL DBs,
Parallel File
System
Accelerated
Infrastructure
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Watson ML Community Edition (WMLCE)
21
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Watson ML Community Edition (WMLCE)
22
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
23
IBMWatsonMachineLearning
CommunityEdition
DockerContainers
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
24
IBMWatsonMachineLearning
CommunityEdition
UniversalBaseImages(UBI)
Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
25Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
Thank you
26
Florin Manaila
—
florin.manaila@de.ibm.com
ibm.com
27

More Related Content

What's hot

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 

What's hot (20)

POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 

Similar to IBM HPC Transformation with AI

New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureRebekah Rodriguez
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019Paula Koziol
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCLinaro
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCIntro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCSlide_N
 
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIbm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIBM Switzerland
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013Cliff Kinard
 

Similar to IBM HPC Transformation with AI (20)

PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019IBM Power Systems at FIS InFocus 2019
IBM Power Systems at FIS InFocus 2019
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
IBM PureSystems
IBM PureSystemsIBM PureSystems
IBM PureSystems
 
Palestra IBM-Mack Zvm linux
Palestra  IBM-Mack Zvm linux  Palestra  IBM-Mack Zvm linux
Palestra IBM-Mack Zvm linux
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCIntro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
 
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex systemIbm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
Ibm symp14 referent_marcus alexander mac dougall_ibm x6 und flex system
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future ComputingGanesan Narayanasamy
 
Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab Ganesan Narayanasamy
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 
AI/Cloud Technology access
AI/Cloud Technology access AI/Cloud Technology access
AI/Cloud Technology access
 
Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab Special Purpose IBM Center of excellence lab
Special Purpose IBM Center of excellence lab
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 

Recently uploaded

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

IBM HPC Transformation with AI

  • 1. HPC Transformation with AI COGNITIVE SYSTEMS Ing. Florin Manaila Senior Architect and Inventor Cognitive Systems (Distributed Deep Learning and HPC) IBM Systems Hardware Europe Member of the IBM Academy of Technology (AoT) March 24, 2020
  • 2. Technical R&D today disruption 2Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 3. Knowledge Discovery Pipeline 3Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 4. Infrastructure Demands for AI Equipped for volumes of data Flexible storage for a range of data demands Versatile, power-efficient data center accelerators Advanced I/O for minimal latency Scalability and distributed data center capability Inference Powerful data center accelerators with coherence Advanced I/O for high bandwidth and low latency Proven scalability Training Equipped for volumes of data *** IBM and Business Partner Internal Use Only *** Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 6. 6 AI Workflow Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 7. Distributed Deep Learning Common options 7 SINGLE ACCELERATOR DATA PARALLEL MODEL PARALLEL DATA AND MODEL PARALLEL 1x Accelerator 4x Accelerators 4x Accelerators 4x n Accelerators Longer Training Time Shorter Training Time System1System2Systemn System Data Data DataDataDataData DataDataData Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 8. Node 0 Data-Parallel Framework Distributed Learning Partition 0 GPU 0 GPU 1 GPU 2 GPU 3 Partition (0,0) Partition (0,1) Partition (0,2) Partition (0,3) Node 1 Partition 1 GPU 0 GPU 1 GPU 2 GPU 3 Partition (1,0) Partition (1,1) Partition (1,2) Partition (1,3) 8 Large Dataset Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 9. Rearchitecting the hardware for AI 9Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 10. 10 Experimentation Scaling Production Architecture for large IBM HPC Cluster Hardware overview for bare-metal / K8s Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 11. Architecture for large IBM HPC/AI Cluster Hardware overview for bare-metal / K8s 11Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 12. Why Capable and High BW Interconnects? § Each part of an application runs on the best compute location o But there are performance and programmability challenges o Desire a highly-capable interconnect between PEs § Low-latency communication and high data bandwidth § Fine-grained + bulk data transfers § Consistent, unified view of memory § Hardware cache coherence & atomic operations PE Type A (e.g. CPU) PE Type B (e.g. GPU) Large, low-latency Memory Small, High- bandwidth Memory Heterogeneous systems are attractive for efficient performance Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 13. 13 Enterprise AI Hardware Portfolio Expansion IBM Power AC922 TRAIN Powering the Fastest Supercomputer DATA IBM Power IC922 INFERENCE IBM Power IC922 Deploy AI into ProductionStorage Dense Server § Enterprise ready cloud deployment with RH OpenShift and Power Systems reliability § Superior I/O for data movement: PCIe Gen 4 § Superior price/performance § Best training platform with 4x faster model iteration § ~6x data throughput with NVLink to GPUs § Synergistic HW/SW offerings for ease of use and leadership performance § NVIDIA V100 SMX2 GPUs § Superior density and through-put to inference accelerators § Open design for accelerator flexibility § Deploy inference at scale with SW capabilities leveraging superior IO § NVIDIA T4 GPUs § Upcoming: FPGAs and ASICs Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 14. Inference server details 14 Form Factor § 19” Rack 2U Server POWER9 Processor § 2 dd2.3x P9 Nimbus chips (LaGrange pkg) § TDP : 225W § 12 (160W), 16, 20 cores (SMT <= 4) Memory § Direct Attach Memory § 32 DDR4 ISDIMM Slots @2400 MHz (double drop) § 16 DDR4 ISDIMMs @2667 MHz (single drop) § 16, 32, 64 GB RDIMMs § 2 TB Max memory § 340 GB/s peak memory BW (with 16x DIMMs) 10 Integrated I/O Slots – Standard PCIe Riser § 2 PCIe G3 x16 FHFL Slots (Supports double-wide accelerator) § 2 PCIe G4 x16 LP Slots § 2 PCIe G3 x8 FHFL Slots (physically x16) § 2 PCIe G3 x8 FHHL Slots § 2 PCIe G3 x16 LP Slots Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 15. Inference server details 15 Internal Storage § Integrated Storage Controller = None § 24x 2.5” SAS/SATA Native I/O § 2x USB 3.0 in rear § 2x 1G baseT (one shared mgmt) + 1x 1G dedicated IPMI § Serial port, VGA port § TPM2.0 via Nuvoton NPCT650ABAWX included (for Secure OS and trusted boot) MTM (machine type – model) • 9183-22X Accelerators Nvidia T4 Accelerator 16GB PCIe3 x16 LP More to come Networking Mellanox MCX555A-ECAT 1-PORT EDR 100Gb IB CONNECTX- 5 GEN3 PCIe x16 CAPI CAPABLE Mellanox MCX556A-ECAT 2-PORT EDR 100Gb IB CONNECTX- 5 GEN4 PCIe x16 CAPI CAPABLE Mellanox MCX516A-CDAT 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 Mellanox MCX4121A-XCAT 2-PORT 10Gb NIC&ROCE SR/Cu PCIe 3.0 Mellanox MCX4121A-ACAT 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 Marvell BCM957810A1008ICDM 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8 Marvell BCM957800A1006ICDM QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8 Marvell BCM957800A1006ICDM QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8 Broadcom BCM5719-4P 1Gb E'NET(UTP) 4-PORT ADPTR, PCIE-x4 Fiber Channel Broadcom LPe16002B-M6 2-PORT FIBER CHANNEL(16Gb/s), PCIE3-8X Broadcom LPe32002-M2 2-PORT FIBER CHANNEL(32Gb/s), PCIE3-8X Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 16. Inference server details 16 8x SAS/SATA 8x SAS/SATA 8x SAS/SATA 9183-22X OS Support – LE § RHEL 7.6-alt RAS § Concurrent Maintenance disks § Redundant Hot plug Power § Redundant Hot plug fans § Customer Install and Repair § Simplified Op Panel § In-rack system service BMC Service Processor § Aspeed AST2500 § OpenBMC Certifications § FCC Class A § ASHRAE A2 Environment (10-35C) § Acoustics Datacenter 1A HDD Drives HDD; 600GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; SED HDD; 1200GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; SED HDD; 2400GB; 2.5"; 10k; SAS; 12Gb/s; 4Kn/512e; Non-SED SSD Drives SSD; 240GB; 2.5"; SATA; 6Gb/s; 1.4 DWPD; NonSED SSD; 960GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED SSD; 1920GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED SSD; 3840GB; 2.5"; SATA; 6Gb/s; 2.5 DWPD; NonSED CONTROLLERS Broadcom (LSI) MegaRAID 9361-8i SAS3 Controller w/ 8 internal ports (2GB Cache) PCIe 3.0 x8 LP with cables Broadcom 9300-8i PCIe gen3 x8 LP with cables Broadcom 9305-16i PCIe gen3 x8 LP with cables Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 17. Feature List: § REST Management § IPMI § SSH based SOL § Power and Cooling Management § Event Logs § Zeroconf discoverable § Sensors Features In Progress: § Full IPMI 2.0 Compliance with DCMI § Verified Boot § HTML5 Java Script Web User Interface § BMC RAS IBM is the OpenBMC Community Leader § Facebook § Google § IBM § Intel § Microsoft § OCP 17 OpenBMC is a free open source management software Linux distribution § Inventory § LED Management § Host Watchdog § Simulation § Code Update Support for multiple BMC/BIOS images § POWER On Chip Controller (OCC) SupportCognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 18. Next-Generation Software Stack 18Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 19. What’s in the training of deep neural networks? Neural network model Billions of parameters Gigabytes Computation Iterative gradient based search Millions of iterations Mainly matrix operations Data Millions of images, sentences Terabytes Workload characteristics: Both compute and data intensive! 19Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 20. 20 AI Infrastructure Stack ON-CLOUD and ON-PREM Transform & Prep Data (ETL) Micro-Services / Applications Governance AI (Fairness, Explainable AI, Model Health, Accuracy) APIs (external and in-house) Machine & Deep Learning Libraries & Frameworks Distributed Computing Data Lake & Data Stores Segment Specific: Finance, Retail, Healthcare, Automotive Speech, Vision, NLP, Sentiment TensorFlow, Caffe, Pytorch SparkML, Snap.ML Spark, MPI Hadoop HDFS, NoSQL DBs, Parallel File System Accelerated Infrastructure Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 21. Watson ML Community Edition (WMLCE) 21 Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 22. Watson ML Community Edition (WMLCE) 22 Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 25. 25Cognitive Systems Europe / March 24 / © 2020 IBM Corporation
  • 27. 27