SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
AI Chip Trends and
Forecast
Joo-Young Kim
2019. 11. 6
ICT 산업전망컨퍼런스
Outline
• Introduction
- Brief history & deep neural network models
- AI stack and new computing paradigm
• Trends in AI chips
- ??
• Looking forward
- ???
Motivation
Artificial Intelligence is pervasive in our everyday life.
Brief History of Neural Networks
F. Rosenblatt B. Widrow – M. Hoff M. Minsky – S. Papert D. Rumelhart – G. Hinton – R. Wiliams G. Hinton – R. Salakhutdinov
• Learnable weights and
Threshold
• XOR problem • Nonlinear problem solved
• High computation
• Local optima and overfitting
• Hierarchical feature
learning
1943
• Adjustable
but not
learnable
weights
W. S. McCulloch - W. Pitts
1958 1960 1969 1986 2006
Deep
Deep
Learning!
First Winter
Second Winter
- ImageNet
- AlphaGo
- Speech
translation
- Video synthesis
- Smart factory
- …
Deep Learning ≠ AI
AI Searching
Planning
Knowledge
Representation
Fuzzy Logic
Natural Language
Processing
Genetic
Algorithm
Any technique that enables
computers to mimic human behavior
AI techniques that have computers learn
without being explicitly programmed
A subset of ML that makes the
computation of multi-layer neural
networks feasible
Deep Learning Revolution
Human: ~5%
ImageNet (ILSVRC) Top-5 Error
* F. Veen, The Asimov Institute, 2016
Deep learning starts to surpass human-level
recognition on specific tasks
*
What Has Changed?
• Traditional pattern recognition
• Deep learning (model + data)
Trainable Features & Classifiers
"Dog"
"Ship"
"Car"CNN
DNN
Hand-Crafted
Features
HoG
SIFT
Haar Like
Simple Trainable
Classifiers
SVM
K-Means
"Dog"
"Ship"
"Car"
Amount of Data
Performance
Traditional
algorithms
Deep
learning
Andrew Ng, Stanford CS 229 class
Popular Types of DNNs
MLP
(Multi-Layer
Perceptron)
CNN
(Convolutional)
RNN
(Recurrent)
Characteristic Fully Connected Convolutional Layer
Sequential Data
Feedback Path
Major
Application
Speech
Recognition
Image Recognition
Speech / Action
Recognition
Number of
Layers
3~10 Layers Max ~100 Layers 3~5 Layers
Convolution
Pooling
Input
Output
Fully
Connected
Output
Input
Hidden
Output
Input
Matrix-vector
multiplication
3d convolution
Matrix-vector
multiplication
Main
Computation
And Many More Models…
1970s 1980s 1990s
MLP
Cognitron/
CNN
Attention only
Network
Tacotron
YOLO v3
BERT
FCN
DeepLab v3+
VoxelNet
PointNet++
WGAN
CycleGAN
StarGAN
DiscoGAN
DenseNet
DeepLab
Enet
YOLO v2
PointNet
WaveNet
CNN+RNN
ResNet
Fast R-CNN
Faster R-CNN
YOLO
GRU
R-CNN
LSTM
LeNet
AlexNet
VGGNet
GoogleNet
SegNet
2012~2014 2015 2016 2017~
DNN Characteristics
• Requires big data & big computation
• Modern hardware enabled deep learning revolution (e.g. GPU)
# Operations: ~2Billion/Face
# Mem. Access: ~1GB/Face
Local-feature-based Deep Learning-based
# Operations: ~0.1Billion/Face
# Mem. Access: ~10MB/Face
AI Stack
Algorithm
Chip
Device
• Neuromorphic chip: brain-inspired computing, biological brain simulation, …
• Programmable chip: GPU, ASIC, FPGA, DSP, …
• System-on-Chip: multi-core, many-core, SIMD, systolic array, …
• Development tool-chain: frameworks, compiler, simulator, optimizer, …
• High bandwidth off-chip memory: HBM, DRAM, GDDR, STT-MRAM, …
• High speed interface: SerDes, Optical Communication
• CMOS 3d stacking
• Emerging computing device: analog computing, memristors, …
• Emerging memory device: ReRAM, PCRAM, …
• Neural network topology: MLP, CNN, RNN, LSTM, SNN, …
• Deep neural networks: AlexNet, ResNet, GoogLeNet, …
• Neural network algorithms: reinforcement Learning, adversarial Learning, …
• Machine learning algorithms: SVM, K-NN, decision tree, Markov chain, …
Application
• Video/Image: face recognition, image generation, video analysis, …
• Sound and Speech: speech recognition, language synthesis, music generation, …
• NLP: text analysis, language translation, human-machine communication, …
• Robotics: autopilot, UAV, industrial automation, …
New Computational Paradigm
• Being able to handle big data
- Huge storage capacity, high bandwidth, low latency memory access
- “memory wall” problem
• Large amount of computation
- Mainly linear algebraic operations while control is relatively simple
- Parameters are large
• Training vs Inference
- Training: accuracy, data capacity (~1018 bytes), weight synchronization
- Inference: speed, energy, hardware cost, efficient reading of weights
• Data precision / Model compression / Pruning
- Not always require a high precision
• High configurability
- Tradeoff between energy efficiency and adaptability to new algorithms
AI Chip Landscape
https://basicmi.github.io/AI-Chip/
DNN Hardware
• Mobile Based
- Specific AI
- Real-time
- Limited resources
- Low-power
• Cloud Based
- General AI
- High computing
- Huge memory
- Fast & accurate
learning
Low
Low Real-Time Operation
GlobalDataSharing
Cloud Server
Mobile
Edge Terminal
Control &
Control Model
Control &
Control Model
Data &
Learned Model
Data &
Learned Model
High
High
Cloud based AI Computing
Pre-trained Network
Learning
TrainingData(Dataset)
Inference
on
Cloud / Server
Question
Answer
Voice Assistant
Cloud / ServerDevice / Edge
DNN Chips for Cloud Server
• Nvidia (GPU)
• Goodle (TPU)
• Microsoft (BrainWave)
• Amazon (Inferentia)
• Facebook
• Alibaba, Baidu
Real-Time Operation
GlobalDataSharingLowHigh
HighLow
Cloud Server
• Control based on overall conditions
• Learning with data collected from edge devices
Stand-Alone AI
NVIDIA Volta Google Cloud TPU
Mobile/Edge based AI Inference
Self-driving vehicle, intelligent camera/speaker, IoT devices
Pretrained Network
Learning
Inference
on
Cloud / Server
TrainingData(Dataset)
Inference
Using Pretrained Model
User
Interface
&
APPs
platform
Sensors
Camera
MIC
GPS
Gyro
Touch
Local Data
Load
Pretrained
Model
Cloud / ServerDevice / Edge
Mobile/Edge DNN Applications
• Apple
• Huawei
• Qualcomm
• ARM
• CEVA
• Cambrion
• Horizon Robotics
• MobileEye
• Tesla
PowerConsumption
Inference Speed
HighLow
Slow Fast
IoT
Wearable
Smart
Phone
Drone
Automoitive
Mobile
Robot
Cloud vs Edge Summary
High Performance
High Precision
High Flexibility
Distributed
Scalable
Diverse Requirements
(Car, Wearable, IoT)
Low-Moderate Throughput
Low Latency
Power Efficiency
Low Cost
High Throughput
Low Latency
Power Efficiency
Distributed
Scalable
?
Cloud / Datacenter Edge / Mobile
InferenceTraining
Functional Integration
Intel CPU
nVidia GPU
Xilinx FPGA
MIT Eyeriss
KAIST LNPU
Google TPU
Microsoft BrainWave
…
Wave DPU
Tsinghua Thinker
…
Hardware Classic Domain specific Reconfigurable
Domain Cloud Could/Edge Could/Edge
Target Workload Training oriented Inference Inference & Training
Early 1st Stage 2nd Stage
?
Courtesy of GTIC 2019
Two Different Directions
• Be more flexible
• Be more compact
Dedicated
Diannao
2014
RS Dataflow
MIT Eyeriss
Systolic Array
Google TPU
Sparse-aware
Nvidia SCNN
Flexible Bitwidth
KAIST UNPU …
2016 2017.6 20182017.1
Compression
Pruning
EIE
2016.2
BWN TWN Low-bit Training
DoReFa-Net
Low-bit Quantization
LQ-Nets …
2016.8 2018.2 2018.92016.11
Courtesy of GTIC 2019
Von Neumann Bottleneck for AI
• Von-Neumann architecture serially fetches data from the storage
• AI application needs to access tremendous amount of data
AI
Processor
Memory
BUS
Bottleneck
Memory Wall
NVM DRAM
SRAM
(Cache)
Processor
Von Neumann Bottleneck
NVM DRAM
SRAM
(Cache)
Processor
Increasing Memory Bandwidth
How can we increase bandwidth between processor
and memory?
Near Memory Processing
PCB
Processor
DRAM
DRAM
3D-Stacked
Memory
High Bandwidth Memory
Advantage of HBM
ITEM GDDR5 HBM (High B/W Memory)
System
Configuration
DRAM 8Gb GDDR5 12ea 4GB HBM 4ea
Size 3120 ㎟ 792 ㎟
Density 12GB 16GB
Bandwidth 384GB/s 1024GB/s
Power 18.3W (1.5W X GDDR5 12ea) 9.1W (2.3W X HBM 4ea)
Pin
(Ball)
Speed 8 Gbps 2 Gbps
# I/O 32 per chip (Total 384) 1024 per cube (Total 4096)
20
16GFX 예측 사양
• HBM 4~6cube
• 4~8GB, 512~1TB/s
• 10TFLOPs
Processor
HBM
HBM
HBM
HBM
Processor
G5 G5
G5 G5
G5
G5
G5
G5
G5
G5
G5
G5
60mm
52mm
33mm
24mm
-75%
1.3x
3.6x
+18%
Emerging Non-Volatile Memories
White Paper on AI Chip Technologies (2018)
DRAM-like speed, Flash-like capacity and Non-Volatile
Towards into Memory
NVM DRAM
SRAM
(Cache)
Processor
Von Neumann Bottleneck
NVM DRAM
SRAM
(Cache)
Processor
NVM DRAM
P
SRAM
P P P P P P P P P P P
Traditional
Near-Memory/
Emerging Mem
In-Memory/
Memory-centric
Processing-In-Memory (PIM)
AI
Processor
Memory
BUS
Bottleneck
Von Neuman
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
Mem
Logic
✓ Non Von Neuman
✓ Converged logic + memory (high BW)
✓ Suitable for data-intensive workloads
✓ Little data movement (energy efficient)
PIM Chip
Renesas’s ternary SRAM PIM for AI inference
S. Okumura, et al., “A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K
synapses/mm2”, Symposium on VLSI Technology 2019
AI Framework
Provides higher-level abstraction to developers/users
Convolution on volumes (1 line)
Max pooling (1 line)
Non-linear ReLu (1 line)
Hyper-Scale AI Accelerators
TPU v3 (2018)
Cerebras Wafer Scale Engine (2019)
Usually hundreds of processing units
in array structure..
How do we program this?
1.2T transistors
46,225 mm2
400,000 cores
18GB SRAM
100 Pb/s interconnect
Who Fills this Gap?
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Cerebras WSE
AI Software Tool-Chain
• Xilinx AI Edge Platform
SW developers,
users
A few hardware
vendors
Problem: No De Facto SW Tool & Hardware!
C / Java Compiler toolchain CPU
Software Hardware
OpenGL /
CUDA
Compiler toolchain GPU
Verilog / VHDL Synthesis toolchain FPGA
?
Neuromorphic Chip
• “Spiking neuron”
• Closely model biological
neuron’s activity
• Incorporates concept of
time: integrate and fire
• Computationally expensive
• Difficult to train →
Not practical at moment
1st
Generation
• Perceptron based
• No non-linear
functions
• Binary output
2nd
Generation
3rd
Generation
• Non-linear activation functions
• Continuous output
• Functional modeling of our
brain
• Working real-life applications
• We are here (FF, CNN, RNN, …)
IBM TrueNorth
• 5.4 billion transistors in 28nm CMOS process
• 64 x 64 neurosynaptic core, 256 neurons each
Paul A. Merolla, et al. "A million spiking-neuron integrated circuit with a scalable communication network and interface." Science2014
IBM TrueNorth
• Mimicking synapse with SRAM
• However, SRAM is not made for this (large area, cost).
Pre-Neuron (Tx)
Post-Neuron (Rx)
Synapse is a structure that
permits a neuron to pass an
electrical signal to another.
Input Spike
1 0
0 0
1 1
8T SRAM cell
as synapse
Output Spike (Voltage)
WL
BLT
BLT
BLBLWLT
Voltage Σ ΣΣ
1
0
1
SRAM Synapse Array
Neuromorphic Chip with Emerging Device
• New model requires device with new physics
• FeFET: better storing/transferring analog signal
M. Jerry., et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training.", IEEE IEDM 2017
Neuromorphic Chip with Emerging NV RAM
Z. Wang., et al, "Fully memristive neural networks for pattern classification with unsupervised learning", Nature Electronics 2018
• ReRAM (memristor)
1. Cloud and Edge Will be Closer
• Edge inference & learning will be more important due to privacy concern, real-time
operation, and power constraint
• Federated learning: leverage cloud’s big data advantage on edge devices
Mobile Devices
Encryption & Compressed Data
Local
Learning
Custom
Weight
Cloud Servers
Shared
Model
Broadcasting
shared model
Aggregating
encrypted data
Local
Learning
Custom
Weight
Local
Learning
Custom
Weight
Local
Learning
Custom
Weight
Updated
Model
2. AI Chips will Support More Algorithms
• State-of-the-art algorithms are moving from traditional MLP, CNN, RNN
to GAN, reinforcement learning, and unsupervised learning
Inference only
(MLP/RNN or CNN)
Inference + Training
(MLP/CNN/RNN)Inference only
(MLP/CNN/RNN)
Inference + Training
(GAN/RL/
Unsupervised/
MLP/CNN/RNN)
3. AI Security Will be Essential
• It is easy to break DNN based recognition
New cyberattack: imperceivable noise injection
Breaking state-of-the-art face recognition Physical attack for autonomous vehicles
4. For Success of AI Chip, SW is the Key
• How did ARM dominate mobile processor market?
- Low power consumption with reasonable performance
- ARM’s competent complier toolchain & licensing strategy
• Why did GPU have a big success in early DNN revolution?
- That was because of CUDA which is a generic programming language for data-
intensive workloads like matrix-vector multiplication
- CUDA was baked for several years to have developers actually use it
AI Chip Researches at KAIST
Multi-core OR
Processor
Dual
Layered
3-stage
Pipeline
Simultaneous
Multi-threading
Multi-classifier
System
Multi-core
MIMD
2008 2009 2010 2012 2013
Visual
Attent
ion
Tomato
Sauce
$2.60
Heterogeneous
Many-SIMD
20142011 2015 2016 2017
Multi-Modal UI/UX
Deep Learning Core
Tan
k
Rob
ot
Recogni
tion
Result
Sen
sing
Convolution
Cluster 0
FC LSTM
Processor
Ext. Gateway
Convolution
Cluster 3
Convolution
Cluster 1
Convolution
Cluster 2
CNN
Ctrlr.
Aggregation
Core
Top
Ctrlr.
Ext.Gateway
Stereo Matching
Processor
Face
Recognition
& CNN–RNN
2018 2019
Core
#1
Core
#2
Core
#3
Ext.
IF#0
Aggregation Core
1-DSIMDCoreTopCtrlr.
4000mm
WMEM
Ext.
IF#1
AFL
LBPE#0
LBPE#1
LBPE#2
LBPE#3
LBPE#4
LBPE#5
Matching
Core
Pipelined CNN PE
FMEM2
FMEM0
FWD/BWD Unit
CNN
Core1
Custom
RISC
WMEM
FMEM1
LocalDMA
Ext. I/F Ext. I/FTop Controller
ICP-PSO Engine
NN
PIM 0
NN
PIM 1
NN
PIM 2
NN
PIM 3
NN
PIM 4
NN
PIM 5
NN
PIM 6
NN
PIM 7
NN
PIM 8
NN
PIM 9
NN
PIM 10
NN
PIM 11
NN
PIM 12
NN
PIM 13
NN
PIM 14
NN
PIM 15
Variable Bit
DNN
& 3D HGR
Core Cluster 3Core Cluster 2
Core Cluster 1
Core1
Core3Core2
DMEM
PEL
PEL
PEL
PEL
ILB
Central Core
I/F
1
fp-unitSIMDCoreTopCtrlr.RISC
I/F
0
Process 65nm 1P8M Logic CMOS
Area 4mm × 4mm
SRAM 448 KB
Supply 0.67V – 1.1V
Power
196 mW @ 200MHz, 1.1V
2.4 mW @ 10MHz, 0.67V
Precision
Feature – bfloat16
Weight – 16/8/4'b FXP
Peak
Performance
204 GFLOPS @ 16b Weight
Ext.
IF 0
Core 1
Core 2 Core 3
Top Ctrlr.
Ext.
IF 1
UMEM
UMEMBMEM
BMEM
PE Arrays
Exp. Compressor
1-D SIMD
Supervised &
Reinforcement
Learning
Input Image
Hand Depth
Tracking
Results
-1.5cm
10cm
0cm
5cm
-5cm
7.5cm
0cm
5cm
40cm
20cm
25cm
30cm
35cm
-5cm
10cm
0cm
5cm
-5cm
10cm
0cm
5cm
40cm
20cm
25cm
30cm
35cm
X
Y
-5cm
10cm
0cm
5cm
-5cm
10cm
0cm
5cm
40cm
20cm
25cm
30cm
35cm
X
Y
X
Y
Hand
Tracking
Accuracy
2.6mm@20cm
4.6mm@30cm
3.4mm@40cm
5cm
Seperated
VGA
Cameras
22.5cm
40.5cm

Mais conteúdo relacionado

Mais procurados

An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningJerry Lu
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...Dozie Agbo
 
presentation on Edge computing
presentation on Edge computingpresentation on Edge computing
presentation on Edge computingsairamgoud16
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 
Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Emad Nabil
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNINGARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNINGDr Sandeep Ranjan
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care Meenakshi Sood
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computinginside-BigData.com
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Artificial intelligence ppt
Artificial intelligence pptArtificial intelligence ppt
Artificial intelligence pptRamhariYadav
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 

Mais procurados (20)

An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Deep learning
Deep learningDeep learning
Deep learning
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
presentation on Edge computing
presentation on Edge computingpresentation on Edge computing
presentation on Edge computing
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Machine learning
Machine learningMachine learning
Machine learning
 
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNINGARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Artificial intelligence ppt
Artificial intelligence pptArtificial intelligence ppt
Artificial intelligence ppt
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 

Semelhante a AI Chip Trends and Forecast

Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataRomeo Kienzler
 
Neural Networks - it’s usage in Corporate
Neural Networks -it’s usage in CorporateNeural Networks -it’s usage in Corporate
Neural Networks - it’s usage in CorporateGopi Krishna Nuti
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesDanny Sabour
 
資工人為什麼需要學習數位電路?
資工人為什麼需要學習數位電路?資工人為什麼需要學習數位電路?
資工人為什麼需要學習數位電路?Murphy Chen
 
realtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxrealtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxgopikahari7
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...Infoshare
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stickgeetachauhan
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSgeetachauhan
 
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & FlinkCognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & FlinkRomeo Kienzler
 
IoT - Life at the Edge
IoT - Life at the EdgeIoT - Life at the Edge
IoT - Life at the EdgeNick Hunn
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...StampedeCon
 
Persistent Memory Productization driven by AI & ML
Persistent Memory Productization driven by AI & MLPersistent Memory Productization driven by AI & ML
Persistent Memory Productization driven by AI & MLDanny Sabour
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...AI Frontiers
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemAI Frontiers
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 

Semelhante a AI Chip Trends and Forecast (20)

Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
 
Neural Networks - it’s usage in Corporate
Neural Networks -it’s usage in CorporateNeural Networks -it’s usage in Corporate
Neural Networks - it’s usage in Corporate
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changes
 
資工人為什麼需要學習數位電路?
資工人為什麼需要學習數位電路?資工人為什麼需要學習數位電路?
資工人為什麼需要學習數位電路?
 
realtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxrealtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptx
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stick
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCS
 
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & FlinkCognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
 
IoT - Life at the Edge
IoT - Life at the EdgeIoT - Life at the Edge
IoT - Life at the Edge
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
 
Persistent Memory Productization driven by AI & ML
Persistent Memory Productization driven by AI & MLPersistent Memory Productization driven by AI & ML
Persistent Memory Productization driven by AI & ML
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 

Último

Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this periodSaraIsabelJimenez
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 

Último (20)

Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this period
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 

AI Chip Trends and Forecast

  • 1. AI Chip Trends and Forecast Joo-Young Kim 2019. 11. 6 ICT 산업전망컨퍼런스
  • 2. Outline • Introduction - Brief history & deep neural network models - AI stack and new computing paradigm • Trends in AI chips - ?? • Looking forward - ???
  • 3. Motivation Artificial Intelligence is pervasive in our everyday life.
  • 4. Brief History of Neural Networks F. Rosenblatt B. Widrow – M. Hoff M. Minsky – S. Papert D. Rumelhart – G. Hinton – R. Wiliams G. Hinton – R. Salakhutdinov • Learnable weights and Threshold • XOR problem • Nonlinear problem solved • High computation • Local optima and overfitting • Hierarchical feature learning 1943 • Adjustable but not learnable weights W. S. McCulloch - W. Pitts 1958 1960 1969 1986 2006 Deep Deep Learning! First Winter Second Winter - ImageNet - AlphaGo - Speech translation - Video synthesis - Smart factory - …
  • 5. Deep Learning ≠ AI AI Searching Planning Knowledge Representation Fuzzy Logic Natural Language Processing Genetic Algorithm Any technique that enables computers to mimic human behavior AI techniques that have computers learn without being explicitly programmed A subset of ML that makes the computation of multi-layer neural networks feasible
  • 6. Deep Learning Revolution Human: ~5% ImageNet (ILSVRC) Top-5 Error * F. Veen, The Asimov Institute, 2016 Deep learning starts to surpass human-level recognition on specific tasks *
  • 7. What Has Changed? • Traditional pattern recognition • Deep learning (model + data) Trainable Features & Classifiers "Dog" "Ship" "Car"CNN DNN Hand-Crafted Features HoG SIFT Haar Like Simple Trainable Classifiers SVM K-Means "Dog" "Ship" "Car" Amount of Data Performance Traditional algorithms Deep learning Andrew Ng, Stanford CS 229 class
  • 8. Popular Types of DNNs MLP (Multi-Layer Perceptron) CNN (Convolutional) RNN (Recurrent) Characteristic Fully Connected Convolutional Layer Sequential Data Feedback Path Major Application Speech Recognition Image Recognition Speech / Action Recognition Number of Layers 3~10 Layers Max ~100 Layers 3~5 Layers Convolution Pooling Input Output Fully Connected Output Input Hidden Output Input Matrix-vector multiplication 3d convolution Matrix-vector multiplication Main Computation
  • 9. And Many More Models… 1970s 1980s 1990s MLP Cognitron/ CNN Attention only Network Tacotron YOLO v3 BERT FCN DeepLab v3+ VoxelNet PointNet++ WGAN CycleGAN StarGAN DiscoGAN DenseNet DeepLab Enet YOLO v2 PointNet WaveNet CNN+RNN ResNet Fast R-CNN Faster R-CNN YOLO GRU R-CNN LSTM LeNet AlexNet VGGNet GoogleNet SegNet 2012~2014 2015 2016 2017~
  • 10. DNN Characteristics • Requires big data & big computation • Modern hardware enabled deep learning revolution (e.g. GPU) # Operations: ~2Billion/Face # Mem. Access: ~1GB/Face Local-feature-based Deep Learning-based # Operations: ~0.1Billion/Face # Mem. Access: ~10MB/Face
  • 11. AI Stack Algorithm Chip Device • Neuromorphic chip: brain-inspired computing, biological brain simulation, … • Programmable chip: GPU, ASIC, FPGA, DSP, … • System-on-Chip: multi-core, many-core, SIMD, systolic array, … • Development tool-chain: frameworks, compiler, simulator, optimizer, … • High bandwidth off-chip memory: HBM, DRAM, GDDR, STT-MRAM, … • High speed interface: SerDes, Optical Communication • CMOS 3d stacking • Emerging computing device: analog computing, memristors, … • Emerging memory device: ReRAM, PCRAM, … • Neural network topology: MLP, CNN, RNN, LSTM, SNN, … • Deep neural networks: AlexNet, ResNet, GoogLeNet, … • Neural network algorithms: reinforcement Learning, adversarial Learning, … • Machine learning algorithms: SVM, K-NN, decision tree, Markov chain, … Application • Video/Image: face recognition, image generation, video analysis, … • Sound and Speech: speech recognition, language synthesis, music generation, … • NLP: text analysis, language translation, human-machine communication, … • Robotics: autopilot, UAV, industrial automation, …
  • 12. New Computational Paradigm • Being able to handle big data - Huge storage capacity, high bandwidth, low latency memory access - “memory wall” problem • Large amount of computation - Mainly linear algebraic operations while control is relatively simple - Parameters are large • Training vs Inference - Training: accuracy, data capacity (~1018 bytes), weight synchronization - Inference: speed, energy, hardware cost, efficient reading of weights • Data precision / Model compression / Pruning - Not always require a high precision • High configurability - Tradeoff between energy efficiency and adaptability to new algorithms
  • 14. DNN Hardware • Mobile Based - Specific AI - Real-time - Limited resources - Low-power • Cloud Based - General AI - High computing - Huge memory - Fast & accurate learning Low Low Real-Time Operation GlobalDataSharing Cloud Server Mobile Edge Terminal Control & Control Model Control & Control Model Data & Learned Model Data & Learned Model High High
  • 15. Cloud based AI Computing Pre-trained Network Learning TrainingData(Dataset) Inference on Cloud / Server Question Answer Voice Assistant Cloud / ServerDevice / Edge
  • 16. DNN Chips for Cloud Server • Nvidia (GPU) • Goodle (TPU) • Microsoft (BrainWave) • Amazon (Inferentia) • Facebook • Alibaba, Baidu Real-Time Operation GlobalDataSharingLowHigh HighLow Cloud Server • Control based on overall conditions • Learning with data collected from edge devices Stand-Alone AI NVIDIA Volta Google Cloud TPU
  • 17. Mobile/Edge based AI Inference Self-driving vehicle, intelligent camera/speaker, IoT devices Pretrained Network Learning Inference on Cloud / Server TrainingData(Dataset) Inference Using Pretrained Model User Interface & APPs platform Sensors Camera MIC GPS Gyro Touch Local Data Load Pretrained Model Cloud / ServerDevice / Edge
  • 18. Mobile/Edge DNN Applications • Apple • Huawei • Qualcomm • ARM • CEVA • Cambrion • Horizon Robotics • MobileEye • Tesla PowerConsumption Inference Speed HighLow Slow Fast IoT Wearable Smart Phone Drone Automoitive Mobile Robot
  • 19. Cloud vs Edge Summary High Performance High Precision High Flexibility Distributed Scalable Diverse Requirements (Car, Wearable, IoT) Low-Moderate Throughput Low Latency Power Efficiency Low Cost High Throughput Low Latency Power Efficiency Distributed Scalable ? Cloud / Datacenter Edge / Mobile InferenceTraining
  • 20. Functional Integration Intel CPU nVidia GPU Xilinx FPGA MIT Eyeriss KAIST LNPU Google TPU Microsoft BrainWave … Wave DPU Tsinghua Thinker … Hardware Classic Domain specific Reconfigurable Domain Cloud Could/Edge Could/Edge Target Workload Training oriented Inference Inference & Training Early 1st Stage 2nd Stage ? Courtesy of GTIC 2019
  • 21. Two Different Directions • Be more flexible • Be more compact Dedicated Diannao 2014 RS Dataflow MIT Eyeriss Systolic Array Google TPU Sparse-aware Nvidia SCNN Flexible Bitwidth KAIST UNPU … 2016 2017.6 20182017.1 Compression Pruning EIE 2016.2 BWN TWN Low-bit Training DoReFa-Net Low-bit Quantization LQ-Nets … 2016.8 2018.2 2018.92016.11 Courtesy of GTIC 2019
  • 22. Von Neumann Bottleneck for AI • Von-Neumann architecture serially fetches data from the storage • AI application needs to access tremendous amount of data AI Processor Memory BUS Bottleneck Memory Wall
  • 23. NVM DRAM SRAM (Cache) Processor Von Neumann Bottleneck NVM DRAM SRAM (Cache) Processor Increasing Memory Bandwidth How can we increase bandwidth between processor and memory?
  • 25. Advantage of HBM ITEM GDDR5 HBM (High B/W Memory) System Configuration DRAM 8Gb GDDR5 12ea 4GB HBM 4ea Size 3120 ㎟ 792 ㎟ Density 12GB 16GB Bandwidth 384GB/s 1024GB/s Power 18.3W (1.5W X GDDR5 12ea) 9.1W (2.3W X HBM 4ea) Pin (Ball) Speed 8 Gbps 2 Gbps # I/O 32 per chip (Total 384) 1024 per cube (Total 4096) 20 16GFX 예측 사양 • HBM 4~6cube • 4~8GB, 512~1TB/s • 10TFLOPs Processor HBM HBM HBM HBM Processor G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 G5 60mm 52mm 33mm 24mm -75% 1.3x 3.6x +18%
  • 26. Emerging Non-Volatile Memories White Paper on AI Chip Technologies (2018) DRAM-like speed, Flash-like capacity and Non-Volatile
  • 27. Towards into Memory NVM DRAM SRAM (Cache) Processor Von Neumann Bottleneck NVM DRAM SRAM (Cache) Processor NVM DRAM P SRAM P P P P P P P P P P P Traditional Near-Memory/ Emerging Mem In-Memory/ Memory-centric
  • 28. Processing-In-Memory (PIM) AI Processor Memory BUS Bottleneck Von Neuman Mem Logic Mem Logic Mem Logic Mem Logic Mem Logic Mem Logic Mem Logic Mem Logic Mem Logic ✓ Non Von Neuman ✓ Converged logic + memory (high BW) ✓ Suitable for data-intensive workloads ✓ Little data movement (energy efficient)
  • 29. PIM Chip Renesas’s ternary SRAM PIM for AI inference S. Okumura, et al., “A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2”, Symposium on VLSI Technology 2019
  • 30. AI Framework Provides higher-level abstraction to developers/users Convolution on volumes (1 line) Max pooling (1 line) Non-linear ReLu (1 line)
  • 31. Hyper-Scale AI Accelerators TPU v3 (2018) Cerebras Wafer Scale Engine (2019) Usually hundreds of processing units in array structure.. How do we program this? 1.2T transistors 46,225 mm2 400,000 cores 18GB SRAM 100 Pb/s interconnect
  • 32. Who Fills this Gap? … … … … … … … … … … … … … … … … … … … … Cerebras WSE
  • 33. AI Software Tool-Chain • Xilinx AI Edge Platform SW developers, users A few hardware vendors
  • 34. Problem: No De Facto SW Tool & Hardware! C / Java Compiler toolchain CPU Software Hardware OpenGL / CUDA Compiler toolchain GPU Verilog / VHDL Synthesis toolchain FPGA ?
  • 35. Neuromorphic Chip • “Spiking neuron” • Closely model biological neuron’s activity • Incorporates concept of time: integrate and fire • Computationally expensive • Difficult to train → Not practical at moment 1st Generation • Perceptron based • No non-linear functions • Binary output 2nd Generation 3rd Generation • Non-linear activation functions • Continuous output • Functional modeling of our brain • Working real-life applications • We are here (FF, CNN, RNN, …)
  • 36. IBM TrueNorth • 5.4 billion transistors in 28nm CMOS process • 64 x 64 neurosynaptic core, 256 neurons each Paul A. Merolla, et al. "A million spiking-neuron integrated circuit with a scalable communication network and interface." Science2014
  • 37. IBM TrueNorth • Mimicking synapse with SRAM • However, SRAM is not made for this (large area, cost). Pre-Neuron (Tx) Post-Neuron (Rx) Synapse is a structure that permits a neuron to pass an electrical signal to another. Input Spike 1 0 0 0 1 1 8T SRAM cell as synapse Output Spike (Voltage) WL BLT BLT BLBLWLT Voltage Σ ΣΣ 1 0 1 SRAM Synapse Array
  • 38. Neuromorphic Chip with Emerging Device • New model requires device with new physics • FeFET: better storing/transferring analog signal M. Jerry., et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training.", IEEE IEDM 2017
  • 39. Neuromorphic Chip with Emerging NV RAM Z. Wang., et al, "Fully memristive neural networks for pattern classification with unsupervised learning", Nature Electronics 2018 • ReRAM (memristor)
  • 40. 1. Cloud and Edge Will be Closer • Edge inference & learning will be more important due to privacy concern, real-time operation, and power constraint • Federated learning: leverage cloud’s big data advantage on edge devices Mobile Devices Encryption & Compressed Data Local Learning Custom Weight Cloud Servers Shared Model Broadcasting shared model Aggregating encrypted data Local Learning Custom Weight Local Learning Custom Weight Local Learning Custom Weight Updated Model
  • 41. 2. AI Chips will Support More Algorithms • State-of-the-art algorithms are moving from traditional MLP, CNN, RNN to GAN, reinforcement learning, and unsupervised learning Inference only (MLP/RNN or CNN) Inference + Training (MLP/CNN/RNN)Inference only (MLP/CNN/RNN) Inference + Training (GAN/RL/ Unsupervised/ MLP/CNN/RNN)
  • 42. 3. AI Security Will be Essential • It is easy to break DNN based recognition New cyberattack: imperceivable noise injection Breaking state-of-the-art face recognition Physical attack for autonomous vehicles
  • 43. 4. For Success of AI Chip, SW is the Key • How did ARM dominate mobile processor market? - Low power consumption with reasonable performance - ARM’s competent complier toolchain & licensing strategy • Why did GPU have a big success in early DNN revolution? - That was because of CUDA which is a generic programming language for data- intensive workloads like matrix-vector multiplication - CUDA was baked for several years to have developers actually use it
  • 44. AI Chip Researches at KAIST Multi-core OR Processor Dual Layered 3-stage Pipeline Simultaneous Multi-threading Multi-classifier System Multi-core MIMD 2008 2009 2010 2012 2013 Visual Attent ion Tomato Sauce $2.60 Heterogeneous Many-SIMD 20142011 2015 2016 2017 Multi-Modal UI/UX Deep Learning Core Tan k Rob ot Recogni tion Result Sen sing Convolution Cluster 0 FC LSTM Processor Ext. Gateway Convolution Cluster 3 Convolution Cluster 1 Convolution Cluster 2 CNN Ctrlr. Aggregation Core Top Ctrlr. Ext.Gateway Stereo Matching Processor Face Recognition & CNN–RNN 2018 2019 Core #1 Core #2 Core #3 Ext. IF#0 Aggregation Core 1-DSIMDCoreTopCtrlr. 4000mm WMEM Ext. IF#1 AFL LBPE#0 LBPE#1 LBPE#2 LBPE#3 LBPE#4 LBPE#5 Matching Core Pipelined CNN PE FMEM2 FMEM0 FWD/BWD Unit CNN Core1 Custom RISC WMEM FMEM1 LocalDMA Ext. I/F Ext. I/FTop Controller ICP-PSO Engine NN PIM 0 NN PIM 1 NN PIM 2 NN PIM 3 NN PIM 4 NN PIM 5 NN PIM 6 NN PIM 7 NN PIM 8 NN PIM 9 NN PIM 10 NN PIM 11 NN PIM 12 NN PIM 13 NN PIM 14 NN PIM 15 Variable Bit DNN & 3D HGR Core Cluster 3Core Cluster 2 Core Cluster 1 Core1 Core3Core2 DMEM PEL PEL PEL PEL ILB Central Core I/F 1 fp-unitSIMDCoreTopCtrlr.RISC I/F 0 Process 65nm 1P8M Logic CMOS Area 4mm × 4mm SRAM 448 KB Supply 0.67V – 1.1V Power 196 mW @ 200MHz, 1.1V 2.4 mW @ 10MHz, 0.67V Precision Feature – bfloat16 Weight – 16/8/4'b FXP Peak Performance 204 GFLOPS @ 16b Weight Ext. IF 0 Core 1 Core 2 Core 3 Top Ctrlr. Ext. IF 1 UMEM UMEMBMEM BMEM PE Arrays Exp. Compressor 1-D SIMD Supervised & Reinforcement Learning Input Image Hand Depth Tracking Results -1.5cm 10cm 0cm 5cm -5cm 7.5cm 0cm 5cm 40cm 20cm 25cm 30cm 35cm -5cm 10cm 0cm 5cm -5cm 10cm 0cm 5cm 40cm 20cm 25cm 30cm 35cm X Y -5cm 10cm 0cm 5cm -5cm 10cm 0cm 5cm 40cm 20cm 25cm 30cm 35cm X Y X Y Hand Tracking Accuracy 2.6mm@20cm 4.6mm@30cm 3.4mm@40cm 5cm Seperated VGA Cameras 22.5cm 40.5cm