SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
An Experimental Study of Reduced-Voltage
Operation in Modern FPGAs
for Neural Network Acceleration
50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad Onur Mutlu
2
Executive Summary
• Motivation: Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
• Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs
• Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by Undervolting below nominal level
• Evaluation Setup
 5 Image classification workloads
 3 Xilinx UltraScale+ ZCU102 platforms
 2 On-chip voltage rails
• Main Results
 Large voltage guardband (i.e., 33%)
 >3X power-efficiency gain
3
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
4
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
5
Motivation and Background
• Motivation
 Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
 FPGAs: Getting popular but less power-efficient than equivalent ASICs
 Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs
 Any potential of “Undervolting FPGAs” for power-efficiency of neural networks?
• Background
 Neural Networks: Widely deployed with an inherent resilience to errors
 FPGAs: Higher throughput than GPUs and better flexibility than ASICs
 Undervolting: Reduces power cons., may incur reliability or performance issues
6
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
7
Our Goal
• Primary Goal
 Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by:
 Undervolting (i.e., underscaling voltage below nominal level)
• Secondary Goals
 Study the voltage behavior of real FPGAs (e.g., guardband)
 Study the power-efficiency gain of undervolting for neural networks
 Study the reliability overhead
 Study the frequency underscaling to prevent the accuracy loss
 Study the effect of environmental temperature
8
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
9
Overall Methodology
• 5 CNN image classification
workloads, i.e., VGGNet, GoogleNet,
AlexNet, ResNet50, Inception.
• Xilinx DNNDK to map CNN into FPGA
 By default optimized for INT8
• 3 identical samples of Xilinx ZCU102
 ZYNQ Ultrscale+ architecture
 Hard-core ARM for data orchestration
 FPGA for CNN acceleration
• 2 on-chip voltage rails, via PMBus
 𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …
 𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs
 𝑉𝑛𝑜𝑚= 850mV (set by manufacturer)
Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
10
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
11
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
12
Overall Voltage Behavior
Slight variation of voltage behavior across platforms and benchmarks
 FPGA stops operatingCrash
• Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽)
• Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽)
• Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)
 No performance or reliability loss
 Added by the vendor to ensure the
worst-case conditions
 Large guardband, average of 33%
Guard
band
 A narrow voltage region
 Neural network accuracy collapse
Critical
13
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
14
Power-Reliability Trade-off
Power-efficiency (GOPs/W) gain
• >3X power saving (2.6X by eliminating guardband and further 43% in critical region)
Reliability overhead (i.e., CNN accuracy loss)
VGGNet GoogleNet AlexNet ResNet Inception
• Slight variation across 3 platforms and 5 workloads
• No accuracy loss in the guardband, accuracy collapse in the critical region
• Slight variation across 3 platforms and 5 workloads
15
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
16
VCCINT
(mV)
Fmax
(Mhz)
GOPs
(Norm)
Power (W)
Norm)
GOPs/W
(Norm)
GOPs/J
(Norm)
570 333 1 1 1 1
565 300 0.94 0.97 0.97 0.87
560 250 0.83 0.84 0.99 0.75
555 250 0.83 0.78 1.06 0.8
550 250 0.83 0.75 1.1 0.83
545 250 0.83 0.74 1.12 0.84
540 200 0.7 0.56 1.25 0.75
Frequency Underscaling
• Simultaneous frequency underscaling to prevent CNN accuracy collapse in
the critical voltage region
• For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum
operating frequency at which there is no accuracy loss
• Leads to performance and energy-efficiency loss
Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency
(Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
17
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
18
Environmental Temperature
• Effects of environmental temperature on power-reliability
 Use fan speed to test temperature in [34 ℃, 50 ℃]
 On-board temperature monitored by PMBus
• Temperature effects on power consumption
 ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)
 By undervolting, the impact of temperature on power consumption reduces.
• Temperature effects on reliability
 ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)
 In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly.
GoogleNet
19
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
20
Prior Works
• Undervolting
 Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs
 Large voltage guardband (from 12% to 35%) for many devices
 This work extends such studies for off-the-shelf FPGAs especially for
neural network acceleration and confirms large guardbands (i.e., 33%)
• Power-Efficient Neural Networks
 Studies on architectural-, hardware-, and software-level techniques
 Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)
 This work proposes a hardware-level undervolting for further
power-saving (>3X) in FPGAs.
• Reliability in Neural Networks
 Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)
 Some studies on real hardware (e.g., EDEN-MICRO’19)
 This work studies the reliability of neural networks on real FPGAs
when operating at reduced voltage levels.
21
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Frequency Underscaling
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
22
Summary, Conclusion, and Future Works
• Summary
 We improve the power-efficiency (>3X) of off-the-shelf
FPGAs via undervolting for neural network accelerators:
 2.6X by eliminating the guardband (i.e., 33%) without any cost
 43% by further undervolting below the guardband with the cost of
 either accuracy loss, when the frequency is not underscaled
 or performance loss, when the frequency is underscaled
• Conclusion
 Undervolting is an effective way to achieve significant
power-saving for FPGA-based neural network accelerators
• Future Works
 HW & SW extension of our undervolting for FPGA clusters
and other neural network models and tools
An Experimental Study of Reduced-Voltage
Operation in Modern FPGAs
for Neural Network Acceleration
50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad Onur Mutlu

Mais conteúdo relacionado

Semelhante a An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)

Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
DVClub
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
DzH QWuynh
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 

Semelhante a An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation) (20)

LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
 
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
ISCA2021 Tutorial-Methods for Characterization and Analysis of Voltage Margin...
 
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPower Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
RT15 Berkeley | Power HIL Simulator (SimP) A prototype to develop a high band...
 
Low power in vlsi with upf basics part 1
Low power in vlsi with upf basics part 1Low power in vlsi with upf basics part 1
Low power in vlsi with upf basics part 1
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC Applications
 
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...
PAN Files from IEC 61853-1 Test Data: Why Using Datasheet I-V Values is a Bad...
 
Electrical & Energy Audit (2).pptx
Electrical & Energy Audit (2).pptxElectrical & Energy Audit (2).pptx
Electrical & Energy Audit (2).pptx
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia LaboratoryRT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
 
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
 
참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
 
Energy saving policies final
Energy saving policies finalEnergy saving policies final
Energy saving policies final
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 
Use of the Genetic Algorithm-Based Fuzzy Logic.pptx
Use of the Genetic Algorithm-Based Fuzzy Logic.pptxUse of the Genetic Algorithm-Based Fuzzy Logic.pptx
Use of the Genetic Algorithm-Based Fuzzy Logic.pptx
 

Mais de LEGATO project

Mais de LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 

Último

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Último (20)

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 

An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration (full presentation)

  • 1. An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020 Behzad Salami Baturay Onural Ismail Yuksel Fahrettin Koc Oguz Ergin Adrian Cristal Osman Unsal Hamid Sarbazi-Azad Onur Mutlu
  • 2. 2 Executive Summary • Motivation: Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs • Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs • Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by Undervolting below nominal level • Evaluation Setup  5 Image classification workloads  3 Xilinx UltraScale+ ZCU102 platforms  2 On-chip voltage rails • Main Results  Large voltage guardband (i.e., 33%)  >3X power-efficiency gain
  • 3. 3 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 4. 4 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 5. 5 Motivation and Background • Motivation  Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs  FPGAs: Getting popular but less power-efficient than equivalent ASICs  Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs  Any potential of “Undervolting FPGAs” for power-efficiency of neural networks? • Background  Neural Networks: Widely deployed with an inherent resilience to errors  FPGAs: Higher throughput than GPUs and better flexibility than ASICs  Undervolting: Reduces power cons., may incur reliability or performance issues
  • 6. 6 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 7. 7 Our Goal • Primary Goal  Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by:  Undervolting (i.e., underscaling voltage below nominal level) • Secondary Goals  Study the voltage behavior of real FPGAs (e.g., guardband)  Study the power-efficiency gain of undervolting for neural networks  Study the reliability overhead  Study the frequency underscaling to prevent the accuracy loss  Study the effect of environmental temperature
  • 8. 8 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 9. 9 Overall Methodology • 5 CNN image classification workloads, i.e., VGGNet, GoogleNet, AlexNet, ResNet50, Inception. • Xilinx DNNDK to map CNN into FPGA  By default optimized for INT8 • 3 identical samples of Xilinx ZCU102  ZYNQ Ultrscale+ architecture  Hard-core ARM for data orchestration  FPGA for CNN acceleration • 2 on-chip voltage rails, via PMBus  𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …  𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs  𝑉𝑛𝑜𝑚= 850mV (set by manufacturer) Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
  • 10. 10 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 11. 11 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 12. 12 Overall Voltage Behavior Slight variation of voltage behavior across platforms and benchmarks  FPGA stops operatingCrash • Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽) • Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽) • Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)  No performance or reliability loss  Added by the vendor to ensure the worst-case conditions  Large guardband, average of 33% Guard band  A narrow voltage region  Neural network accuracy collapse Critical
  • 13. 13 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 14. 14 Power-Reliability Trade-off Power-efficiency (GOPs/W) gain • >3X power saving (2.6X by eliminating guardband and further 43% in critical region) Reliability overhead (i.e., CNN accuracy loss) VGGNet GoogleNet AlexNet ResNet Inception • Slight variation across 3 platforms and 5 workloads • No accuracy loss in the guardband, accuracy collapse in the critical region • Slight variation across 3 platforms and 5 workloads
  • 15. 15 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 16. 16 VCCINT (mV) Fmax (Mhz) GOPs (Norm) Power (W) Norm) GOPs/W (Norm) GOPs/J (Norm) 570 333 1 1 1 1 565 300 0.94 0.97 0.97 0.87 560 250 0.83 0.84 0.99 0.75 555 250 0.83 0.78 1.06 0.8 550 250 0.83 0.75 1.1 0.83 545 250 0.83 0.74 1.12 0.84 540 200 0.7 0.56 1.25 0.75 Frequency Underscaling • Simultaneous frequency underscaling to prevent CNN accuracy collapse in the critical voltage region • For each voltage level below 𝑽 𝒎𝒊𝒏, we found the 𝑭 𝒎𝒂𝒙, the maximum operating frequency at which there is no accuracy loss • Leads to performance and energy-efficiency loss Best setting for High-performance and Energy-efficiency Best setting for Power-efficiency (Voltage steps= 5mV, Frequency steps= 50Mhz)- shown for GoogleNet
  • 17. 17 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 18. 18 Environmental Temperature • Effects of environmental temperature on power-reliability  Use fan speed to test temperature in [34 ℃, 50 ℃]  On-board temperature monitored by PMBus • Temperature effects on power consumption  ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)  By undervolting, the impact of temperature on power consumption reduces. • Temperature effects on reliability  ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)  In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly. GoogleNet
  • 19. 19 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 20. 20 Prior Works • Undervolting  Studies for off-the-shelf real CPUs, GPUs, ASICs, DRAMs  Large voltage guardband (from 12% to 35%) for many devices  This work extends such studies for off-the-shelf FPGAs especially for neural network acceleration and confirms large guardbands (i.e., 33%) • Power-Efficient Neural Networks  Studies on architectural-, hardware-, and software-level techniques  Undervolting in neural network ASIC accelerator (e.g., GreenTPU-DAC’19)  This work proposes a hardware-level undervolting for further power-saving (>3X) in FPGAs. • Reliability in Neural Networks  Analytical and simulation-based studies (e.g., Thundervolt-DAC’18)  Some studies on real hardware (e.g., EDEN-MICRO’19)  This work studies the reliability of neural networks on real FPGAs when operating at reduced voltage levels.
  • 21. 21 Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Frequency Underscaling - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 22. 22 Summary, Conclusion, and Future Works • Summary  We improve the power-efficiency (>3X) of off-the-shelf FPGAs via undervolting for neural network accelerators:  2.6X by eliminating the guardband (i.e., 33%) without any cost  43% by further undervolting below the guardband with the cost of  either accuracy loss, when the frequency is not underscaled  or performance loss, when the frequency is underscaled • Conclusion  Undervolting is an effective way to achieve significant power-saving for FPGA-based neural network accelerators • Future Works  HW & SW extension of our undervolting for FPGA clusters and other neural network models and tools
  • 23. An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 30th June, 2020 Behzad Salami Baturay Onural Ismail Yuksel Fahrettin Koc Oguz Ergin Adrian Cristal Osman Unsal Hamid Sarbazi-Azad Onur Mutlu