SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Identifying Critical Neurons in
ANN Architectures using Mixed
Integer Programming
Mostafa ElAraby Guy Wolf Margarida Carvalho
[OPTML Neurips 2020]
Motivation
The existence of efficient sub-networks
with faster inference and marginal loss in
accuracy when compared to the original
over-parameterized ANN.
Frankle and Carbin (2018) introduced the
lottery ticket conjecture and empirically
showed the existence of a lucky pruned
subnetwork, a winning ticket.
Contents
● Preliminary about Mixed-Integer Programming (MIP)
● Neuron Importance Score introduction
● MIP formulation
● Proposed Algorithm
● Scalability
● Experiments
● Conclusion
MIP Preliminary
Linear Programming (LP)
A powerful framework used to solve optimization problems in the following form:
Linear Objective
An optimization objective that
can be minimization or
maximization of a linear
equation consisting of
decision variables that wer are
trying to solve.
Decision Variables
Variable optimized by the LP
optimization process and at
the end the solver will give its
solved value.
Linear Constraints
A set of constraints on the
decision variables that the
solver tries to satisfy
narrowing its optimization
space. The solver would throw
an infeasible solution if it can’t
find a solution satisfying the
linear constraints.
Mixed-Integer Programming (MIP)
Similar to the linear programming optimization but can have integer decision
variables along with continuous variables used in linear programming.
It is considered a harder problem that can be relaxed into a linear programming
problem.
Branch and Bound algorithm
We relax our MIP into an LP if we solve it
we are lucky and wwe get the optimal
solution. Otherwise, which is the normal
case we take an integer variable having a
float solution (branching variable) and we
add linear constraints excluding that
solution resulting in 2 new MIPs.
Neuron Importance Score
Introduction
MIP solver will compute a neuron
importance score [0-1] for neurons in
convolutional/ fully connected layers.
Neurons with small importance score
can be safely pruned without loss in
terms of accuracy.
MIP Formulation
Linear layers with no activation
Let h be a decision variable representing input value to layer l having weights W
and bias b
ReLU activated layers
Relaxing z decision variable
For faster solving time we relax the binary decision variable to be a relaxed
approximation
Proposed Constraints with Importance Score S
Representing Convolutional layers
We convert convolutional layers to Toeplitz flat matrices converting the
convolution to simple matrices multiplication and using same previous
constraints introduced for the fully connected layers with importance score for
each filter
Objective Function : Softmax
Softmax: is the marginal softmax that penalize for wrong predictions
regardless of the logit value. Y is the one hot encoded true label.
Objective Function: Sparsity
I represents the scaled down importance score (s - 2) that shown empirically to
give non-important neurons a lower score.
When we increase ƛ , more neurons gets the value near zero.
Proposed Algorithm
Scalability
MIP Solvers are slow
Representing a deep neural network is hard to solve in even commercial solvers
making it harder for our algorithm to scale well for large models.
For that problem we propose 2 solutions:
- Parallelizing computation layer wise
- Parallelizing computation Class wise
Parallel layers using decoupled greedy learning
Class-wise decoupling
In this experiment, we show that the neuron importance scores can be
approximated by 1) solving for each class the MIP with only one data point
from it, and then 2) taking the average of the computed scores for each neuron
as the final score estimation. Such procedure would speed-up our methodology
for problems with numerous classes.
Experiments
Pruning Experiments
Robustness Experiments
We show empirically that our framework is robust on different convergence
levels of the trained neural network as shown in the following Figure.
Generalization Experiments
Cross-dataset generalization: sub-network masking is computed on source
dataset (d1 ) and then applied to target dataset (d2 ) by retraining with the
same early initialization. Test accuracies are presented for masked and
unmasked (REF.) networks on d2 , as well as pruning percentage.
Conclusion
We proposed a mixed integer program to compute neuron importance scores in
ReLU-based deep neural networks. Our contributions focus on providing
scalable computation of importance scores in fully connected and
convolutional layers.

Mais conteúdo relacionado

Mais procurados

Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
AYUSH RAJ
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
kylin
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 

Mais procurados (20)

Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Perceptron and Sigmoid Neurons
Perceptron and Sigmoid NeuronsPerceptron and Sigmoid Neurons
Perceptron and Sigmoid Neurons
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Learning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learningLearning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learning
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
CNN
CNNCNN
CNN
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 

Semelhante a Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
Charles Hubbard
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
Naren Chandra Kattla
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
Jie Bao
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
Jacob Ramey
 

Semelhante a Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming (20)

Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
 
Report_NLNN
Report_NLNNReport_NLNN
Report_NLNN
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
 
Group Project
Group ProjectGroup Project
Group Project
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
 
Dynamic programming prasintation eaisy
Dynamic programming prasintation eaisyDynamic programming prasintation eaisy
Dynamic programming prasintation eaisy
 

Último

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Último (20)

Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

  • 1. Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming Mostafa ElAraby Guy Wolf Margarida Carvalho [OPTML Neurips 2020]
  • 2. Motivation The existence of efficient sub-networks with faster inference and marginal loss in accuracy when compared to the original over-parameterized ANN. Frankle and Carbin (2018) introduced the lottery ticket conjecture and empirically showed the existence of a lucky pruned subnetwork, a winning ticket.
  • 3. Contents ● Preliminary about Mixed-Integer Programming (MIP) ● Neuron Importance Score introduction ● MIP formulation ● Proposed Algorithm ● Scalability ● Experiments ● Conclusion
  • 5. Linear Programming (LP) A powerful framework used to solve optimization problems in the following form: Linear Objective An optimization objective that can be minimization or maximization of a linear equation consisting of decision variables that wer are trying to solve. Decision Variables Variable optimized by the LP optimization process and at the end the solver will give its solved value. Linear Constraints A set of constraints on the decision variables that the solver tries to satisfy narrowing its optimization space. The solver would throw an infeasible solution if it can’t find a solution satisfying the linear constraints.
  • 6. Mixed-Integer Programming (MIP) Similar to the linear programming optimization but can have integer decision variables along with continuous variables used in linear programming. It is considered a harder problem that can be relaxed into a linear programming problem.
  • 7. Branch and Bound algorithm We relax our MIP into an LP if we solve it we are lucky and wwe get the optimal solution. Otherwise, which is the normal case we take an integer variable having a float solution (branching variable) and we add linear constraints excluding that solution resulting in 2 new MIPs.
  • 9. Introduction MIP solver will compute a neuron importance score [0-1] for neurons in convolutional/ fully connected layers. Neurons with small importance score can be safely pruned without loss in terms of accuracy.
  • 11. Linear layers with no activation Let h be a decision variable representing input value to layer l having weights W and bias b
  • 13. Relaxing z decision variable For faster solving time we relax the binary decision variable to be a relaxed approximation
  • 14. Proposed Constraints with Importance Score S
  • 15. Representing Convolutional layers We convert convolutional layers to Toeplitz flat matrices converting the convolution to simple matrices multiplication and using same previous constraints introduced for the fully connected layers with importance score for each filter
  • 16. Objective Function : Softmax Softmax: is the marginal softmax that penalize for wrong predictions regardless of the logit value. Y is the one hot encoded true label.
  • 17. Objective Function: Sparsity I represents the scaled down importance score (s - 2) that shown empirically to give non-important neurons a lower score. When we increase ƛ , more neurons gets the value near zero.
  • 20. MIP Solvers are slow Representing a deep neural network is hard to solve in even commercial solvers making it harder for our algorithm to scale well for large models. For that problem we propose 2 solutions: - Parallelizing computation layer wise - Parallelizing computation Class wise
  • 21. Parallel layers using decoupled greedy learning
  • 22. Class-wise decoupling In this experiment, we show that the neuron importance scores can be approximated by 1) solving for each class the MIP with only one data point from it, and then 2) taking the average of the computed scores for each neuron as the final score estimation. Such procedure would speed-up our methodology for problems with numerous classes.
  • 25. Robustness Experiments We show empirically that our framework is robust on different convergence levels of the trained neural network as shown in the following Figure.
  • 26. Generalization Experiments Cross-dataset generalization: sub-network masking is computed on source dataset (d1 ) and then applied to target dataset (d2 ) by retraining with the same early initialization. Test accuracies are presented for masked and unmasked (REF.) networks on d2 , as well as pruning percentage.
  • 27. Conclusion We proposed a mixed integer program to compute neuron importance scores in ReLU-based deep neural networks. Our contributions focus on providing scalable computation of importance scores in fully connected and convolutional layers.