SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Melding the Data-Decision
Pipeline: Decision-Focused
Learning for Combinatorial
Optimization
Bryan Wilder, Bistra Dilkina and Milind Tambe
University of Southern California
AAAI 2019
Abstract
• Introduce a general framework for decision-focused learning, where
the machine learning model is directly trained in conjunction with the
optimization algorithm.
• Instantiate the framework for two broad classes of combinatorial
problems: linear programming and submodular maximization.
• Experiments show that proposed method outperforms the traditional
method in terms of solution quality.
Introduction
• Machine learning: use data to predict unknown quantities with the
help of loss function.
• Optimization algorithm: use predictions to arrive at decision which
maximizes some objective.
• Separating two pieces entirely to train the model may result in bad
decision.
• Focus on combinatorial optimization, propose decision-focused
learning framework which integrates prediction and optimization
algorithm.
Background
Matrix Calculus
• scalar by vector • vector by scalar • vector by vector
Implicit differentiation
• Example:
• We want to find the slope of the tangent line to the circle at the point (3, −4).
• One way to derive
• 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle)
• ⇒ 𝑦′
= −
1
2
25 − 𝑥2 −
1
2 −2𝑥 =
𝑥
√(25−𝑥2)
• 𝑚 = 𝑦′ =
3
√(25−32)
=
3
4
Source: https://www.math.ucdavis.edu/~kouba/CalcOneDIRECTORY/implicitdiffdirectory/ImplicitDiff.html
Implicit differentiation (cont’d)
• However, not every function can be explicitly written as function of
another variable.
• In implicit differentiation, we differentiate each side of an equation with
two variables by treating one of the variables as a function of the other.
• Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥
• 𝑥2 + 𝑦2 = 25
• ⇒ 2𝑥 + 2𝑦
𝑑𝑦
𝑑𝑥
= 0
• ⇒ 𝑦′
=
𝑑𝑦
𝑑𝑥
=
−2𝑥
2𝑦
=
−𝑥
𝑦
• 𝑚 = 𝑦′ =
−𝑥
𝑦
=
−3
−4
=
3
4
Source: https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-2/a/implicit-differentiation-review
Lagrange Multiplier
• Consider the optimization problem
max f x, y
subject to g x, y = 0
• Observe the graph, find that
where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 =
𝜕𝑓 𝑥,𝑦
𝜕𝑥
,
𝜕𝑓 𝑥,𝑦
𝜕𝑦
𝑇
• Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦)
• Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1)
(1)
blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3
red: constraint 𝑔 𝑥, 𝑦 = 𝑐
Source: https://en.m.wikipedia.org/wiki/Lagrange_multiplier
Lagrange Multiplier (cont’d)
• Generalize to 𝒏 variables
• 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛
𝑻
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎
• Generalize to 𝑴 constraints
• ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1
𝑀
𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛)
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀
ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
KKT condition
• Consider the optimization problem
max f 𝐱
subject to
gi 𝐱 ≤ 0 for i = 1, … , 𝑚,
ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙.
• If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that
• Stationarity
𝛻𝑓 𝒙∗ = ෍
𝑖=1
𝑚
𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍
𝑗=1
𝑙
𝜆𝑗 𝛻ℎ𝑗(𝒙∗)
• Primal feasibility
𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚
ℎ𝑗 𝒙∗
= 0, for j = 1, … , 𝑙
• Dual feasibility
𝜇𝑖 ≥ 0, for i = 1, … , 𝑚
• Complementary slackness
𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Source: https://en.m.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions
Source: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf
Linear programming relaxation
• Example:
• In 0-1 integer program, all variables are
• 𝑥𝑖 ∈ {0,1}
• After the relaxation,
• 𝑥𝑖 ∈ [0,1]
• The relaxation transforms an NP-hard
optimization problem into a problem
that can solve in polynomial time.
Source: https://en.wikipedia.org/wiki/Linear_programming_relaxation
Source: https://en.wikipedia.org/wiki/Convex_hull
Method
Problem description
• Consider combinatorial optimization problem
max
𝑥∈𝒳
𝑓(𝑥, 𝜃)
where 𝒳 is a discrete set containing all feasible set.
• Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or
decision vector.
• The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must
be inferred from data.
• Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃.
• Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to
parameters.
Problem description (cont’d)
• Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the
model 𝑚 (supervised manner).
• Define 𝑥∗ 𝜃 = arg max
𝑥∈𝒳
𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃.
• Objective:
max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)]
• Example:
• 𝑦: user ratings of the movie
• 𝜃: movie-actor assignments
• Predict which actors are associated with each movie.
• Classical solution (two stage method)
1. Learn a model 𝑚 using loss function.
• min
𝜔
𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))]
2. Use the learned model to solve the optimization problem.
• Possible cons:
• Loss function does not consider how 𝜔 will affect the decision making.
• Is it possible to do better?
General framework
• 𝑥∗ 𝜃 = arg max
𝑥∈𝒳
𝑓(𝑥, 𝜃)
• 𝑥∗ is a decision from a binary set, which renders output non-
differentiable with respect to 𝜔.
• Consider continuous relaxation of original problem,
𝑥 𝜃 = arg max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥, 𝜃)
where 𝑐𝑜𝑛𝑣 denotes the convex hull.
• Obtain a gradient by sampling a single (𝑦, 𝜃) from training data,
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝜔
=
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝑥 ෡𝜃
𝑑𝑥 ෡𝜃
𝑑෡𝜃
𝑑෡𝜃
𝑑𝜔
where መ𝜃 = 𝑚(𝑦, 𝜔)
max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥( መ𝜃), 𝜃) = max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
General framework (cont’d)
•
𝑑𝑥 ෡𝜃
𝑑෡𝜃
measures how the optimal decision changes with respect to መ𝜃.
• For continuous problems, the optimal continuous decision must
satisfy KKT condition.
• Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}.
• Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the
conditions yields that
• Recall stationarity: 𝛻𝑓 𝒙∗
= σ𝑖=1
𝑚
𝜇𝑖 𝛻𝑔𝑖 𝒙∗
+ σ 𝑗=1
𝑙
𝜆𝑗 𝛻ℎ𝑗(𝒙∗
)
Derivation-Stationarity
Derivation-Stationarity (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
Derivation-Complementary slackness
• Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Define
Derivation-Complementary slackness (cont’d)
Derivation-Complementary slackness (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
By solve this linear system, we can obtain desired
𝑑𝑥
𝑑𝜃
Linear programming
• Consider a linear program with equality and inequality constraints
max 𝜃 𝑇
𝑥 s. t. Ax = b, Gx ≤ ℎ
• Problem: 𝛻𝑥
2
𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes
singular.
• Resolve the regularized problem instead
max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2
2
s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ
• Transform LP into quadratic program(QP).
• All other terms can be derived from (𝑥, 𝜆) which is output from QP
solvers
Submodular maximization
• Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a
ground set of items.
• A set function is submodular if satisfies one of equivalent condition.
• For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have
𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵).
• For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵).
• Focus on the cardinality-constrained optimization max
𝑆 ≤𝑘
𝑓(𝑆).
Submodular maximization (cont’d)
• View a set function as defined on the domain 0,1 𝑉
(indicator view)
• Multilinear extension 𝐹 defined on 0,1 𝑉
(probability view).
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑆⊆𝑉
𝑓 𝑆 ෑ
𝑖∈𝑆
𝑥𝑖 ෑ
𝑖∉𝑆
(1 − 𝑥𝑖)
where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently.
• Instead of solving max
𝑆 ≤𝑘
𝑓(𝑆), we can solve
max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝐹(𝑥)
where 𝒳 = {𝑥 ∈ 0,1 𝑉
: σ𝑖 𝑥𝑖 ≤ 𝑘}
• Multilinear extension has closed form of coverage functions.
• A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗.
• Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item
independently with probability 𝜃𝑖𝑗.
𝐹 𝑥, 𝜃 = ෍
𝑗∈U
𝑤𝑗(1 − ෑ
𝑖∈𝑉
(1 − 𝑥𝑖𝑗 𝜃𝑖𝑗))
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑆⊆𝑉
𝑓 𝑆 ෑ
𝑖∈𝑆
𝑥𝑖 ෑ
𝑖∉𝑆
(1 − 𝑥𝑖)
𝐹 𝑥, 𝜃 = ෍
𝑗∈U
𝑤𝑗(1 − ෑ
𝑖∈𝑉
1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
Experiments
• For linear programming:
• Bipartite matching
• Feature vector: whether each word appeared in the paper.
• Objective: Reconstruct the citation network
• For submodular maximization:
• Budget allocation
• Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels.
• Feature vector: ground truth 𝜃 passed to DNN
• Objective: expected number of customers reached
• Diverse recommendation
• Feature vector: user rating of movie
• Objective: predict which actors are associated with each movie.
Solution quality
• Quality: the objective value of its decision evaluated using the true 𝜃
NN2: two layer NN
RF: random forest
Accuracy
MSE: mean squared error
CE: cross entropy
Conclusion
• Focus on combinatorial optimization and introduce a general
framework for decision-focused learning.
• Instantiate the framework for linear programming and submodular
maximization.
• Experiments show that proposed method leads to better solution
quality although may loss some accuracy.

Mais conteúdo relacionado

Mais procurados

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkParveenMalik18
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference systemParveenMalik18
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Kentaro Minami
 
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Sathishkumar Samiappan
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksSang Jun Lee
 

Mais procurados (20)

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Svm algorithm
Svm algorithmSvm algorithm
Svm algorithm
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_network
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference system
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
 
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 

Semelhante a Paper Study: Melding the data decision pipeline

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdfBechanYadav4
 
Interval programming
Interval programming Interval programming
Interval programming Zahra Sadeghi
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfSantiagoGarridoBulln
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxHarshitSingh334328
 

Semelhante a Paper Study: Melding the data decision pipeline (20)

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf
 
Optmization techniques
Optmization techniquesOptmization techniques
Optmization techniques
 
optmizationtechniques.pdf
optmizationtechniques.pdfoptmizationtechniques.pdf
optmizationtechniques.pdf
 
Interval programming
Interval programming Interval programming
Interval programming
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Linear programing
Linear programing Linear programing
Linear programing
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques
 
Graphical method
Graphical methodGraphical method
Graphical method
 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdf
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptx
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Paper Study: Melding the data decision pipeline

  • 1. Melding the Data-Decision Pipeline: Decision-Focused Learning for Combinatorial Optimization Bryan Wilder, Bistra Dilkina and Milind Tambe University of Southern California AAAI 2019
  • 2. Abstract • Introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm. • Instantiate the framework for two broad classes of combinatorial problems: linear programming and submodular maximization. • Experiments show that proposed method outperforms the traditional method in terms of solution quality.
  • 3. Introduction • Machine learning: use data to predict unknown quantities with the help of loss function. • Optimization algorithm: use predictions to arrive at decision which maximizes some objective. • Separating two pieces entirely to train the model may result in bad decision. • Focus on combinatorial optimization, propose decision-focused learning framework which integrates prediction and optimization algorithm.
  • 5. Matrix Calculus • scalar by vector • vector by scalar • vector by vector
  • 6. Implicit differentiation • Example: • We want to find the slope of the tangent line to the circle at the point (3, −4). • One way to derive • 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle) • ⇒ 𝑦′ = − 1 2 25 − 𝑥2 − 1 2 −2𝑥 = 𝑥 √(25−𝑥2) • 𝑚 = 𝑦′ = 3 √(25−32) = 3 4 Source: https://www.math.ucdavis.edu/~kouba/CalcOneDIRECTORY/implicitdiffdirectory/ImplicitDiff.html
  • 7. Implicit differentiation (cont’d) • However, not every function can be explicitly written as function of another variable. • In implicit differentiation, we differentiate each side of an equation with two variables by treating one of the variables as a function of the other. • Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥 • 𝑥2 + 𝑦2 = 25 • ⇒ 2𝑥 + 2𝑦 𝑑𝑦 𝑑𝑥 = 0 • ⇒ 𝑦′ = 𝑑𝑦 𝑑𝑥 = −2𝑥 2𝑦 = −𝑥 𝑦 • 𝑚 = 𝑦′ = −𝑥 𝑦 = −3 −4 = 3 4 Source: https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-2/a/implicit-differentiation-review
  • 8. Lagrange Multiplier • Consider the optimization problem max f x, y subject to g x, y = 0 • Observe the graph, find that where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 = 𝜕𝑓 𝑥,𝑦 𝜕𝑥 , 𝜕𝑓 𝑥,𝑦 𝜕𝑦 𝑇 • Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦) • Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1) (1) blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3 red: constraint 𝑔 𝑥, 𝑦 = 𝑐 Source: https://en.m.wikipedia.org/wiki/Lagrange_multiplier
  • 9. Lagrange Multiplier (cont’d) • Generalize to 𝒏 variables • 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛 𝑻 • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎 • Generalize to 𝑴 constraints • ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1 𝑀 𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛) • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀 ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
  • 10. KKT condition • Consider the optimization problem max f 𝐱 subject to gi 𝐱 ≤ 0 for i = 1, … , 𝑚, ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙. • If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that • Stationarity 𝛻𝑓 𝒙∗ = ෍ 𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗) • Primal feasibility 𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚 ℎ𝑗 𝒙∗ = 0, for j = 1, … , 𝑙 • Dual feasibility 𝜇𝑖 ≥ 0, for i = 1, … , 𝑚 • Complementary slackness 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Source: https://en.m.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions Source: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf
  • 11. Linear programming relaxation • Example: • In 0-1 integer program, all variables are • 𝑥𝑖 ∈ {0,1} • After the relaxation, • 𝑥𝑖 ∈ [0,1] • The relaxation transforms an NP-hard optimization problem into a problem that can solve in polynomial time. Source: https://en.wikipedia.org/wiki/Linear_programming_relaxation Source: https://en.wikipedia.org/wiki/Convex_hull
  • 13. Problem description • Consider combinatorial optimization problem max 𝑥∈𝒳 𝑓(𝑥, 𝜃) where 𝒳 is a discrete set containing all feasible set. • Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or decision vector. • The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must be inferred from data. • Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃. • Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to parameters.
  • 14. Problem description (cont’d) • Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the model 𝑚 (supervised manner). • Define 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃. • Objective: max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)] • Example: • 𝑦: user ratings of the movie • 𝜃: movie-actor assignments • Predict which actors are associated with each movie.
  • 15. • Classical solution (two stage method) 1. Learn a model 𝑚 using loss function. • min 𝜔 𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))] 2. Use the learned model to solve the optimization problem. • Possible cons: • Loss function does not consider how 𝜔 will affect the decision making. • Is it possible to do better?
  • 16. General framework • 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) • 𝑥∗ is a decision from a binary set, which renders output non- differentiable with respect to 𝜔. • Consider continuous relaxation of original problem, 𝑥 𝜃 = arg max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥, 𝜃) where 𝑐𝑜𝑛𝑣 denotes the convex hull. • Obtain a gradient by sampling a single (𝑦, 𝜃) from training data, 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝜔 = 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝑥 ෡𝜃 𝑑𝑥 ෡𝜃 𝑑෡𝜃 𝑑෡𝜃 𝑑𝜔 where መ𝜃 = 𝑚(𝑦, 𝜔) max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥( መ𝜃), 𝜃) = max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
  • 17. General framework (cont’d) • 𝑑𝑥 ෡𝜃 𝑑෡𝜃 measures how the optimal decision changes with respect to መ𝜃. • For continuous problems, the optimal continuous decision must satisfy KKT condition. • Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}. • Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the conditions yields that
  • 18. • Recall stationarity: 𝛻𝑓 𝒙∗ = σ𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + σ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗ ) Derivation-Stationarity
  • 19. Derivation-Stationarity (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 20. Derivation-Complementary slackness • Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Define
  • 22. Derivation-Complementary slackness (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 23. By solve this linear system, we can obtain desired 𝑑𝑥 𝑑𝜃
  • 24. Linear programming • Consider a linear program with equality and inequality constraints max 𝜃 𝑇 𝑥 s. t. Ax = b, Gx ≤ ℎ • Problem: 𝛻𝑥 2 𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes singular. • Resolve the regularized problem instead max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2 2 s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ • Transform LP into quadratic program(QP).
  • 25.
  • 26.
  • 27.
  • 28. • All other terms can be derived from (𝑥, 𝜆) which is output from QP solvers
  • 29. Submodular maximization • Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a ground set of items. • A set function is submodular if satisfies one of equivalent condition. • For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have 𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵). • For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵). • Focus on the cardinality-constrained optimization max 𝑆 ≤𝑘 𝑓(𝑆).
  • 30. Submodular maximization (cont’d) • View a set function as defined on the domain 0,1 𝑉 (indicator view) • Multilinear extension 𝐹 defined on 0,1 𝑉 (probability view). 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖) where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently. • Instead of solving max 𝑆 ≤𝑘 𝑓(𝑆), we can solve max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝐹(𝑥) where 𝒳 = {𝑥 ∈ 0,1 𝑉 : σ𝑖 𝑥𝑖 ≤ 𝑘}
  • 31. • Multilinear extension has closed form of coverage functions. • A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗. • Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item independently with probability 𝜃𝑖𝑗. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 (1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)) 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖)
  • 32. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
  • 33.
  • 34.
  • 35. Experiments • For linear programming: • Bipartite matching • Feature vector: whether each word appeared in the paper. • Objective: Reconstruct the citation network • For submodular maximization: • Budget allocation • Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels. • Feature vector: ground truth 𝜃 passed to DNN • Objective: expected number of customers reached • Diverse recommendation • Feature vector: user rating of movie • Objective: predict which actors are associated with each movie.
  • 36. Solution quality • Quality: the objective value of its decision evaluated using the true 𝜃 NN2: two layer NN RF: random forest
  • 37. Accuracy MSE: mean squared error CE: cross entropy
  • 38. Conclusion • Focus on combinatorial optimization and introduce a general framework for decision-focused learning. • Instantiate the framework for linear programming and submodular maximization. • Experiments show that proposed method leads to better solution quality although may loss some accuracy.