SlideShare a Scribd company logo
1 of 53
Download to read offline
The Parallel Computing Revolution Is Only Half Over
Rob Schreiber, Cerebras Systems, Inc.
ATPESC, August 1, 2019
The Past
What this is?
And this?
And this?
And this?
Computer History Museum Timeline: 1933
And this?
2.4 million cores
The pioneers
Controversy
Success!
Technology Triumphant
The Crisis
A Pressing Need
A New Era
The parallel computing era
Parallel computers were once controversial
If the parallel fraction is f then the
maximum possible speedup is
1/ f
Amdahl
Speedup, efficiency
S(p) = T(1) / T(p)
E(p) = S(p) / p
Mapreduce: Parallelism is easy. Performance is hard
It’s the memory and the network
And the algorithm
Dusty Decks
Vendor: “I can build you a machine that is a billion
times more powerful than the one you used earlier”
Customer: “But will I need to rewrite my code?”
Automatic parallelization
The compiler should create an optimized,
parallel implementation of the algorithm
in the code
Alan Perlis
Adapting old programs to fit new
machines usually means
adapting new machines to
behave like old ones.
Optimization hinders evolution.
http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Success
Almost all of HPC machine
performance is due to parallelism
• 1975 – 2019: 108 flops à 1017 flops
• 1 OOM from clock rate
• 8 OOM from parallelism
• 2 OOM more power required
• 1 OOM more $ required
Four eras, four exponentials
Pre-electronic
Vacuum tubes
Transistors
VLSI – Moore’s Law
Parallel
supercomputers
Sequential, vector
Electro-mechanical
calculators
ENIAC
The past is easy to predict. The future?
Ivan Sutherland, at Caltech,
1977
“The VLSI Revolution is
Only Half Over”
Was Sutherland right?
Pre-electronic
Vacuum tubes
Transistors
VLSI – Moore’s Law
Parallel
supercomputers
Sequential, vector
Ivan
VLSI: Technology
Triumphant
After Ivan, What Happened?
1975: HPC is based
on high-end
technology
Hardware was
customized for HPC
1971: Infancy of Silicon Valley, microprocessor
1980s:
An
era
of
exploration
Architectures – no one knew what to do
• SIMD / MIMD
• Shared memory / Distributed memory
• COMA
• Hypercubes
• Transputers
• Message-passing dialects
• Latency tolerance
• From workstations to supercomputers
The confused 80s
Pre-electronic
Vacuum tubes
Transistors
VLSI – Moore’s Law
Parallel
supercomputers
Sequential, vector
Custom,
specialized
VLSI
1971 – 1990: Micros catch up
• Transistor count doubles every two years.
• 1971 – 2000 transistors (i4004)
• 1990 – 1,000,000 transistors (i80486)
• It took 19 years of exponential growth to catch up to old style
computers for HPC
• But then…
Killer micros
“No one will survive The Attack of the Killer Micros”
Eugene Brooks, Panel talk at Supercomputing 90.
Cray Research says it is worried by "killer micros" –
compact, extremely fast work stations that sell
for less than $100,000.
John Markoff, in the New York Times, 1991
VLSI and supercomputers after 1990
Then the attack really happens
• A mass market drove investment and innovation
• Dennard scaling made micros faster, cheaper, same power
• Commodity pricing killed architectural specialization
• By 2000, all HPC machines are clusters of commodity designs
• AND – we knew how to program them (MPI)
From 1990 to 2010
7 OOM, from 0.1 teraflops to 0.1 exaflops
Greater than the gains from 1942 (pre-electronic) to 1990
A loss of computer design diversity; the one-size-fits-all processor
The Crisis
Dennard Scaling Ended
… in every technology generation the transistor density doubles,
the circuit becomes 40% faster, and power consumption (with
twice the number of transistors) stays the same.
• 2005-2007: leaky transistors. Vdd stops dropping. Clock rate hits a
wall.
Moore’s Law limit
“There’s no getting around the fact that we build these
things out of atoms.”
-- Gordon Moore
The transition, to the post-Moore era
A pressing need
AI --- for science too
• Compute demand for DNN training
has grown 300,000x since 2012
• Doubling time 3.5mo vs 18 mo (Moore)
• Training a DNN can take weeks
- see, ref https://blog.openai.com/ai-and-compute/
Processor Specialization
Area Transistors Teraflops
(mm^2) (billion)
CPU 800 20 1
GPU 500 10 10
Cerebras Systems Proprietary and Confidential
First GPGPU. Now, a new class of AI-optimized
accelerators
• Purpose-built compute engine
• More parallel compute on each chip
AI-optimized accelerators are here
Compare chips
Area Transistors Teraflops
(mm^2) (billion)
CPU 800 20 1
GPU 500 10 10
AI-optimized 800 20 100
(Volta, GC, TPU)
Still not enough! Days to train Resnet-50 on Volta
A New Era
Six eras, five exponentials
Pre-electronic
Vacuum tubes
Transistors
VLSI – Moore’s Law
Post-Moore
Parallel
supercomputers
Vector
supercomputers
Post-Moore
Mechanical
devices,
from ~100 BC
Scaling everything else
• Specialized architectures
• A heterogeneous Top 500 list
• Heterogeneous clusters of heterogeneous nodes
• Better algorithms
• The AI revolution
• In science as well as all other applications of computing
• New memory technology
• Photonics
• 2.5D and 3D interconnects
• Wafer scale
• Quantum
• Other stuff too weird to mention, yet
Chip scaling and feature scaling
When feature size scaling stops, use bigger chips
EE Times, May 24, 2019
Startups Cerebras, Habana, and UpMem will unveil new deep-learning
processors. Cerebras will describe a much-anticipated device using
wafer-scale integration.
Accelerating AI
Compute Core
Network fabric
Memory
• Distributed memory
• Tightly integrated, fine-grain,
active messages
• One-clock per hop network latency
Scaling the area too
Area Transistors Teraflops
(mm^2) (billion)
CPU 800 20 1
GPU 500 10 10
AI-optimized 800 20 100
Cerebras > 50X to be announced
The Parallel Computing Revolution Is Only Half Over

More Related Content

What's hot (20)

Super computer
Super computerSuper computer
Super computer
 
Super computer
Super computerSuper computer
Super computer
 
Brief Definition of Supercomputers
Brief Definition of SupercomputersBrief Definition of Supercomputers
Brief Definition of Supercomputers
 
invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013 invited speech at Ge2013, Udine 2013
invited speech at Ge2013, Udine 2013
 
Super computers
Super computersSuper computers
Super computers
 
Super computers
Super computersSuper computers
Super computers
 
Supercomputer @ manarat university by reza
Supercomputer  @ manarat university by rezaSupercomputer  @ manarat university by reza
Supercomputer @ manarat university by reza
 
Supercomputer
SupercomputerSupercomputer
Supercomputer
 
Super-Computer Architecture
Super-Computer Architecture Super-Computer Architecture
Super-Computer Architecture
 
Super computers by rachna
Super computers by  rachnaSuper computers by  rachna
Super computers by rachna
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
 
4'th Fastest Super Computer
4'th Fastest Super Computer4'th Fastest Super Computer
4'th Fastest Super Computer
 
Top 10 Supercomputer 2014
Top 10 Supercomputer 2014Top 10 Supercomputer 2014
Top 10 Supercomputer 2014
 
Cray-1 The First Supercomputer
Cray-1 The First SupercomputerCray-1 The First Supercomputer
Cray-1 The First Supercomputer
 
supercomputer
supercomputersupercomputer
supercomputer
 
World’s Fastest Supercomputer | Tianhe - 2
World’s Fastest Supercomputer |  Tianhe - 2World’s Fastest Supercomputer |  Tianhe - 2
World’s Fastest Supercomputer | Tianhe - 2
 
Four types of computers
Four types of computersFour types of computers
Four types of computers
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
 

Similar to The Parallel Computing Revolution Is Only Half Over

Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Third generation computers (hardware and software)
Third generation computers (hardware and software)Third generation computers (hardware and software)
Third generation computers (hardware and software)La Laland
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
Pres. on computers final
Pres. on computers finalPres. on computers final
Pres. on computers finalankur bhalla
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cIan Phillips
 
IS 139 Lecture 1
IS 139 Lecture 1IS 139 Lecture 1
IS 139 Lecture 1wajanga
 
Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8William Hall
 
Evolution of computer generation.
Evolution of computer generation. Evolution of computer generation.
Evolution of computer generation. Mauryasuraj98
 
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdf
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdfTOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdf
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdfJoel XBasxious
 
computer application in hospitality Industry, periyar university unit 1
computer application in hospitality Industry, periyar university  unit 1computer application in hospitality Industry, periyar university  unit 1
computer application in hospitality Industry, periyar university unit 1admin information
 
computer applicationin hospitality Industry1 periyar university unit1
computer applicationin hospitality Industry1 periyar university  unit1computer applicationin hospitality Industry1 periyar university  unit1
computer applicationin hospitality Industry1 periyar university unit1admin information
 
Generations of computer
Generations of computerGenerations of computer
Generations of computerBESOR ACADEMY
 

Similar to The Parallel Computing Revolution Is Only Half Over (20)

Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Computer Generations
Computer GenerationsComputer Generations
Computer Generations
 
Ita unit i
Ita unit iIta unit i
Ita unit i
 
Ita unit i
Ita unit iIta unit i
Ita unit i
 
Third generation computers (hardware and software)
Third generation computers (hardware and software)Third generation computers (hardware and software)
Third generation computers (hardware and software)
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
Pres. on computers final
Pres. on computers finalPres. on computers final
Pres. on computers final
 
Ch 1
Ch 1Ch 1
Ch 1
 
Energy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21cEnergy Efficiant Computing in the 21c
Energy Efficiant Computing in the 21c
 
IS 139 Lecture 1
IS 139 Lecture 1IS 139 Lecture 1
IS 139 Lecture 1
 
Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8Episode 2(2): Electronic automation and computation - Meetup session 8
Episode 2(2): Electronic automation and computation - Meetup session 8
 
Evolution of computer generation.
Evolution of computer generation. Evolution of computer generation.
Evolution of computer generation.
 
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdf
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdfTOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdf
TOPIC 1 LECTURE- DEVELOPMENT OF COMPUTERS.pdf
 
Unit i
Unit  iUnit  i
Unit i
 
computer application in hospitality Industry, periyar university unit 1
computer application in hospitality Industry, periyar university  unit 1computer application in hospitality Industry, periyar university  unit 1
computer application in hospitality Industry, periyar university unit 1
 
Unit i
Unit  iUnit  i
Unit i
 
computer applicationin hospitality Industry1 periyar university unit1
computer applicationin hospitality Industry1 periyar university  unit1computer applicationin hospitality Industry1 periyar university  unit1
computer applicationin hospitality Industry1 periyar university unit1
 
Unit I
Unit  IUnit  I
Unit I
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
 
Generations of computer
Generations of computerGenerations of computer
Generations of computer
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 

Recently uploaded (20)

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

The Parallel Computing Revolution Is Only Half Over

  • 1. The Parallel Computing Revolution Is Only Half Over Rob Schreiber, Cerebras Systems, Inc. ATPESC, August 1, 2019
  • 7. Computer History Museum Timeline: 1933
  • 10. Controversy Success! Technology Triumphant The Crisis A Pressing Need A New Era The parallel computing era
  • 11. Parallel computers were once controversial
  • 12. If the parallel fraction is f then the maximum possible speedup is 1/ f Amdahl
  • 13. Speedup, efficiency S(p) = T(1) / T(p) E(p) = S(p) / p
  • 14. Mapreduce: Parallelism is easy. Performance is hard
  • 15. It’s the memory and the network And the algorithm
  • 16. Dusty Decks Vendor: “I can build you a machine that is a billion times more powerful than the one you used earlier” Customer: “But will I need to rewrite my code?”
  • 17. Automatic parallelization The compiler should create an optimized, parallel implementation of the algorithm in the code
  • 18. Alan Perlis Adapting old programs to fit new machines usually means adapting new machines to behave like old ones. Optimization hinders evolution. http://www.cs.yale.edu/homes/perlis-alan/quotes.html
  • 20. Almost all of HPC machine performance is due to parallelism • 1975 – 2019: 108 flops à 1017 flops • 1 OOM from clock rate • 8 OOM from parallelism • 2 OOM more power required • 1 OOM more $ required
  • 21. Four eras, four exponentials Pre-electronic Vacuum tubes Transistors VLSI – Moore’s Law Parallel supercomputers Sequential, vector Electro-mechanical calculators ENIAC
  • 22. The past is easy to predict. The future?
  • 23. Ivan Sutherland, at Caltech, 1977 “The VLSI Revolution is Only Half Over”
  • 24. Was Sutherland right? Pre-electronic Vacuum tubes Transistors VLSI – Moore’s Law Parallel supercomputers Sequential, vector Ivan
  • 26. After Ivan, What Happened?
  • 27. 1975: HPC is based on high-end technology Hardware was customized for HPC
  • 28. 1971: Infancy of Silicon Valley, microprocessor
  • 29.
  • 31. Architectures – no one knew what to do • SIMD / MIMD • Shared memory / Distributed memory • COMA • Hypercubes • Transputers • Message-passing dialects • Latency tolerance • From workstations to supercomputers
  • 32. The confused 80s Pre-electronic Vacuum tubes Transistors VLSI – Moore’s Law Parallel supercomputers Sequential, vector Custom, specialized VLSI
  • 33. 1971 – 1990: Micros catch up • Transistor count doubles every two years. • 1971 – 2000 transistors (i4004) • 1990 – 1,000,000 transistors (i80486) • It took 19 years of exponential growth to catch up to old style computers for HPC • But then…
  • 34. Killer micros “No one will survive The Attack of the Killer Micros” Eugene Brooks, Panel talk at Supercomputing 90. Cray Research says it is worried by "killer micros" – compact, extremely fast work stations that sell for less than $100,000. John Markoff, in the New York Times, 1991
  • 35. VLSI and supercomputers after 1990 Then the attack really happens • A mass market drove investment and innovation • Dennard scaling made micros faster, cheaper, same power • Commodity pricing killed architectural specialization • By 2000, all HPC machines are clusters of commodity designs • AND – we knew how to program them (MPI)
  • 36. From 1990 to 2010 7 OOM, from 0.1 teraflops to 0.1 exaflops Greater than the gains from 1942 (pre-electronic) to 1990 A loss of computer design diversity; the one-size-fits-all processor
  • 38. Dennard Scaling Ended … in every technology generation the transistor density doubles, the circuit becomes 40% faster, and power consumption (with twice the number of transistors) stays the same. • 2005-2007: leaky transistors. Vdd stops dropping. Clock rate hits a wall.
  • 39. Moore’s Law limit “There’s no getting around the fact that we build these things out of atoms.” -- Gordon Moore
  • 40. The transition, to the post-Moore era
  • 42. AI --- for science too • Compute demand for DNN training has grown 300,000x since 2012 • Doubling time 3.5mo vs 18 mo (Moore) • Training a DNN can take weeks - see, ref https://blog.openai.com/ai-and-compute/
  • 43. Processor Specialization Area Transistors Teraflops (mm^2) (billion) CPU 800 20 1 GPU 500 10 10 Cerebras Systems Proprietary and Confidential
  • 44. First GPGPU. Now, a new class of AI-optimized accelerators • Purpose-built compute engine • More parallel compute on each chip
  • 45. AI-optimized accelerators are here Compare chips Area Transistors Teraflops (mm^2) (billion) CPU 800 20 1 GPU 500 10 10 AI-optimized 800 20 100 (Volta, GC, TPU) Still not enough! Days to train Resnet-50 on Volta
  • 47. Six eras, five exponentials Pre-electronic Vacuum tubes Transistors VLSI – Moore’s Law Post-Moore Parallel supercomputers Vector supercomputers Post-Moore Mechanical devices, from ~100 BC
  • 48. Scaling everything else • Specialized architectures • A heterogeneous Top 500 list • Heterogeneous clusters of heterogeneous nodes • Better algorithms • The AI revolution • In science as well as all other applications of computing • New memory technology • Photonics • 2.5D and 3D interconnects • Wafer scale • Quantum • Other stuff too weird to mention, yet
  • 49. Chip scaling and feature scaling When feature size scaling stops, use bigger chips
  • 50. EE Times, May 24, 2019 Startups Cerebras, Habana, and UpMem will unveil new deep-learning processors. Cerebras will describe a much-anticipated device using wafer-scale integration.
  • 51. Accelerating AI Compute Core Network fabric Memory • Distributed memory • Tightly integrated, fine-grain, active messages • One-clock per hop network latency
  • 52. Scaling the area too Area Transistors Teraflops (mm^2) (billion) CPU 800 20 1 GPU 500 10 10 AI-optimized 800 20 100 Cerebras > 50X to be announced