SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
ICML2012読み会:
Scaling Up Coordinate Descent Algorithms for
       Large L1 regularization Problems
               2012-07-28
             Yoshihiko Suhara
              @sleepy_yoshi
読む論文
• Scaling Up Coordinate Descent Algorithms for
  Large L1 regularization Problems
  – by C. Scherrer, M. Halappanavar, A. Tewari, D.
    Haglin


• Coordinate Descent の並列計算
  – [Bradley+ 11] Parallel Coordinate Descent for L1-
    Regularized Loss Minimization (ICML2011) とか

                                                        2
概要
• 共有メモリマルチコア環境におけるParallel
  Coordinate Descentの一般化フレームワークを紹
  介

• 以下の2つの手法を提案
 – Thread-Greedy Coordinate Descent
 – Coloring-Based Coordinate Descent

• Parallel CDの4手法を実験で比較
 – Thread-Greedy が思いのほかよかった

                                       3
L1正則化損失関数の最適化
• L1正則化損失関数
              𝑛
            1
        min     ℓ 𝒚 𝑖 , 𝑿𝒘   𝑖   + 𝜆 𝒘   1
         𝒘 𝑛
                𝑖=1
• ここで
  – 𝑿 ∈ ℝ 𝑛×𝑘 : 計画行列
  – 𝒘 ∈ ℝ 𝑘 : 重みベクトル
  – ℓ(𝑦,⋅): 微分可能な凸関数

• たとえば
  – Lasso (L1 + 二乗誤差)
  – L1正則化ロジスティック回帰


                                             4
記法
𝑿 = 𝑿1 , 𝑿2 , … , 𝑿 𝑗 , … 𝑿 𝑘
  𝒆 𝑗 = 0, 0, … , 1, … , 0 𝑇
                𝒙1𝑇
                𝒙2𝑇
        𝑿=       ⋮
                𝒙 𝑖𝑇
                 ⋮
                𝒙𝑇 𝑛


                                5
補足: Coordinate Descent
• 座標降下法とも呼ばれる (?)
• 選択された次元に対して直線探索
• いろんな次元の選び方
 – 例) Cyclic Coordinate Descent
• 並列計算する場合には全次元の部分集合を選択して更新




                                  6
GenCD: A Generic Framework for
  Parallel Coordinate Descent


                   なぜかここから英語



                                 7
Generic Coordinate Descent (GenCD)




                                     8
Step 1: Select
• Selecting 𝐽 coordinates
• The selection criteria differs for variations of CD
  techniques
   – cyclic CD (CCD)
   – stochastic CD (SCD)
      • selection of a singlton
   – fully greedy CD
      • 𝐽 = {1, … , 𝑘}
   – Shotgun [Bradley+ 11]
      • selects a random subset of a given size

                                                        9
Step 2: Propose
• Propose step computes a proposed increment 𝛿 𝑗 for
  each 𝑗 ∈ 𝐽.
   – this step does not actually change the weights

• In Step 2, we maintain a vector 𝝋 ∈ ℝ 𝑘 , where 𝝋 𝑗 is a
  proxy for the objective function evaluated at 𝒘 + 𝜹 𝑗 𝒆 𝑗
   – update 𝝋 𝑗 whenever a new proposal is calculated for j
   – 𝝋 is not necessary if the algorithm will accepts all
     proposals



                                                              10
Step 3: Accept
• In Accept step, the algorithm accepts 𝐽′ ⊆ 𝐽
   –  [Bradley+ 11] show correlations among features can
     lead to divergence if too many coordinates are updated at
     once (see below figure)

• In CCD, SCD, Shotgun, the algorithm allows all
  proposals to be accepted
   – No need to calculate 𝝋




                                                             11
Step 4: Update
• In Update step, the algorithm updates
  according to the set 𝐽′




 𝑿𝒘 を保持



                                          12
Approximate Minimization (1/2)
• Propose step calculates a proposed increment
   𝜹 𝑗 for each 𝑗 ∈ 𝐽
        𝛿 = argmin 𝛿 𝐹 𝒘 + 𝛿𝒆 𝑗 + 𝜆|𝒘 𝑗 + 𝛿|
                       1    𝑛
      where, 𝐹 𝒘 =         𝑖=1 ℓ   𝒚 𝑖 , 𝑿𝒘   𝑖
                       𝑛


•  For a general loss function, there is no
  closed-form solution along a given coordinate.
  – Thus, consider approximate minimization

                                                  13
Approximate Minimization (2/2)
• Well known minimizer (e.g., [Yuan and Lin 10])

                    𝛻𝑗 𝐹 𝒘 − 𝜆 𝛻𝑗 𝐹 𝒘 + 𝜆
       𝛿 = −𝜓   𝒘𝑗;           ,
                         𝛽          𝛽
                           𝑎         if 𝑥 < 𝑎
        where, 𝜓 𝑥; 𝑎, 𝑏 = 𝑏         if 𝑥 > 𝑏
                           𝑥        otherwise

                for squared loss 𝛽 = 1, logistic loss 𝛽 = 1/4.


                                                           14
Step 2: Propose (Approximated)
                                                     ′
                                                𝑖ℓ       𝒚 𝑖 ,𝒛 𝑖 ,𝑿 𝑗
                                                                         ?
                                                          𝑛




       Decrease in the approximated objective
                                                                             15
Experiments



              16
Algorithms (conventional)
• SHOTGUN [Bradley+ 11]
   – Select step: random subset of the columns
   – Accept step: accepts every proposal
       • No need to compute a proxy for the objective
   – convergence is guaranteed only if the # of coordinates selected
     is at most 𝑃 ∗ = 𝑘 (*1)
                       2𝜌


• GREEDY
   – Select step: all coordinates
   – Propose step: each thread generating proposals for some subset
     of the coordinates using approximation
   – Accept step: Only accepts the single best among the all threads.


                                     (*1) 𝜌 is the matrix eigenvalue of 𝑿 𝑇 𝑿   17
Comparisons of the Algorithms




                                18
Algorithms (proposed)
• THREAD-GREEDY
   – Select step: random set of coordinates (?)
   – Propose step: each thread generating proposals for some subset of the
     coordinates using approximation
   – Accept step: Each thread accepts the best of the proposals
   – no proof for convergence (however, empirical results are encouraging)

• COLORING
   – Preprocessing: structurally independent features are identified via
     partial distance-2 coloring
   – Select step: a random color is selected
   – Accept step: accepts every proposal
       • since the features are disjoint.




                                                                           19
Implementation and Platform
• Implementation
  – gcc with OpenMP
     • -O3 -fopenmp flags
     • parallel for pragma
     • static scheduling
         – Given n iterations and p threads, each thread gets n/p iterations


• Platform
  – AMD Opteron (Magny-Cours)
     • with 48 cores (12 cores x 4 sockets)
  – 256GB Memory

                                                                               20
Datasets




(Number of Non-Zero)
                        21
Convergence rates




ナゼカワカラナイ




                               22
Scalability




              23
Summary
• Presented GenCD, a generic framework for
  expressing parallel coordinate descent
  – Select, Propose, Accept, Upadte

• Performs convergence and scalability tests for the
  four algorithms
  – but the authors do not favor any of these algorithms
    over the others

• The condition for convergence of the THREAD-
  GREEDY algorithm is an open question
                                                           24
References
• [Yuan and Lin 10] G. Yuan, C. Lin, “A Comparison of Opitmization Methods
  and Software for Large-scale L1-regularized Linear Classification”, Journal
  of Machine Learning Research, vol.11, pp.3183-3234, 2010.
• [Bradley+ 11] J. K. Bradley, A. Kyrola, D. Bickson, C. Guestrin, “Parallel
  Coordinate Descent for L1-Regularized Loss Minimization”, In Proc. ICML
  ‘11, 2011.




                                                                            25
おわり




      26

Mais conteúdo relacionado

Mais procurados

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksSteve Nouri
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningAssociation for Computational Linguistics
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkMila, Université de Montréal
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationJason Anderson
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2Ali Baig
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenZhuyi Xue
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 

Mais procurados (20)

PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 

Destaque

ICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic modelsICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic modelssleepy_yoshi
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7sleepy_yoshi
 
PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5sleepy_yoshi
 
PRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじPRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじsleepy_yoshi
 
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical SearchWSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Searchsleepy_yoshi
 
SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)sleepy_yoshi
 

Destaque (6)

ICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic modelsICML2013読み会: Distributed training of Large-scale Logistic models
ICML2013読み会: Distributed training of Large-scale Logistic models
 
PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7PRML復々習レーン#2 2.3.6 - 2.3.7
PRML復々習レーン#2 2.3.6 - 2.3.7
 
PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5PRML復々習レーン#3 3.1.3-3.1.5
PRML復々習レーン#3 3.1.3-3.1.5
 
PRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじPRML復々習レーン#3 前回までのあらすじ
PRML復々習レーン#3 前回までのあらすじ
 
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical SearchWSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
WSDM2012読み会: Learning to Rank with Multi-Aspect Relevance for Vertical Search
 
SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)SVM実践ガイド (A Practical Guide to Support Vector Classification)
SVM実践ガイド (A Practical Guide to Support Vector Classification)
 

Semelhante a ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptdevesh604174
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent methodSanghyuk Chun
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine LearningFabian Pedregosa
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Grigory Yaroslavtsev
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lectureMuhammadAhmedShah2
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxgnans Kgnanshek
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 

Semelhante a ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems (20)

Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Oh2423312334
Oh2423312334Oh2423312334
Oh2423312334
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)Deep Learning for Computer Vision: Optimization (UPC 2016)
Deep Learning for Computer Vision: Optimization (UPC 2016)
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 

Mais de sleepy_yoshi

KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twittersleepy_yoshi
 
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measuressleepy_yoshi
 
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじsleepy_yoshi
 
PRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじPRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじsleepy_yoshi
 
PRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじPRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじsleepy_yoshi
 
PRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじPRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじsleepy_yoshi
 
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and RecommendationSEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and Recommendationsleepy_yoshi
 
計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-sleepy_yoshi
 
PRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじPRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじsleepy_yoshi
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するsleepy_yoshi
 
PRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじPRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじsleepy_yoshi
 
PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5sleepy_yoshi
 
PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1sleepy_yoshi
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじsleepy_yoshi
 
PRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじPRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじsleepy_yoshi
 
SIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to RankSIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to Ranksleepy_yoshi
 
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5sleepy_yoshi
 
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)sleepy_yoshi
 
SIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to RankSIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to Ranksleepy_yoshi
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++sleepy_yoshi
 

Mais de sleepy_yoshi (20)

KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
 
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measures
 
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじ
 
PRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじPRML復々習レーン#14 前回までのあらすじ
PRML復々習レーン#14 前回までのあらすじ
 
PRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじPRML復々習レーン#13 前回までのあらすじ
PRML復々習レーン#13 前回までのあらすじ
 
PRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじPRML復々習レーン#12 前回までのあらすじ
PRML復々習レーン#12 前回までのあらすじ
 
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and RecommendationSEXI2013読み会: Adult Query Classification for Web Search and Recommendation
SEXI2013読み会: Adult Query Classification for Web Search and Recommendation
 
計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-
 
PRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじPRML復々習レーン#11 前回までのあらすじ
PRML復々習レーン#11 前回までのあらすじ
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装する
 
PRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじPRML復々習レーン#10 前回までのあらすじ
PRML復々習レーン#10 前回までのあらすじ
 
PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5PRML復々習レーン#10 7.1.3-7.1.5
PRML復々習レーン#10 7.1.3-7.1.5
 
PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1PRML復々習レーン#9 6.3-6.3.1
PRML復々習レーン#9 6.3-6.3.1
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじ
 
PRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじPRML復々習レーン#7 前回までのあらすじ
PRML復々習レーン#7 前回までのあらすじ
 
SIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to RankSIGIR2012勉強会 23 Learning to Rank
SIGIR2012勉強会 23 Learning to Rank
 
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
DSIRNLP#3 LT: 辞書挟み込み型転置インデクスFIg4.5
 
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
Collaborative Ranking: A Case Study on Entity Ranking (EMNLP2011読み会)
 
SIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to RankSIGIR2011読み会 3. Learning to Rank
SIGIR2011読み会 3. Learning to Rank
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

ICML2012読み会 Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems

  • 1. ICML2012読み会: Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems 2012-07-28 Yoshihiko Suhara @sleepy_yoshi
  • 2. 読む論文 • Scaling Up Coordinate Descent Algorithms for Large L1 regularization Problems – by C. Scherrer, M. Halappanavar, A. Tewari, D. Haglin • Coordinate Descent の並列計算 – [Bradley+ 11] Parallel Coordinate Descent for L1- Regularized Loss Minimization (ICML2011) とか 2
  • 3. 概要 • 共有メモリマルチコア環境におけるParallel Coordinate Descentの一般化フレームワークを紹 介 • 以下の2つの手法を提案 – Thread-Greedy Coordinate Descent – Coloring-Based Coordinate Descent • Parallel CDの4手法を実験で比較 – Thread-Greedy が思いのほかよかった 3
  • 4. L1正則化損失関数の最適化 • L1正則化損失関数 𝑛 1 min ℓ 𝒚 𝑖 , 𝑿𝒘 𝑖 + 𝜆 𝒘 1 𝒘 𝑛 𝑖=1 • ここで – 𝑿 ∈ ℝ 𝑛×𝑘 : 計画行列 – 𝒘 ∈ ℝ 𝑘 : 重みベクトル – ℓ(𝑦,⋅): 微分可能な凸関数 • たとえば – Lasso (L1 + 二乗誤差) – L1正則化ロジスティック回帰 4
  • 5. 記法 𝑿 = 𝑿1 , 𝑿2 , … , 𝑿 𝑗 , … 𝑿 𝑘 𝒆 𝑗 = 0, 0, … , 1, … , 0 𝑇 𝒙1𝑇 𝒙2𝑇 𝑿= ⋮ 𝒙 𝑖𝑇 ⋮ 𝒙𝑇 𝑛 5
  • 6. 補足: Coordinate Descent • 座標降下法とも呼ばれる (?) • 選択された次元に対して直線探索 • いろんな次元の選び方 – 例) Cyclic Coordinate Descent • 並列計算する場合には全次元の部分集合を選択して更新 6
  • 7. GenCD: A Generic Framework for Parallel Coordinate Descent なぜかここから英語 7
  • 9. Step 1: Select • Selecting 𝐽 coordinates • The selection criteria differs for variations of CD techniques – cyclic CD (CCD) – stochastic CD (SCD) • selection of a singlton – fully greedy CD • 𝐽 = {1, … , 𝑘} – Shotgun [Bradley+ 11] • selects a random subset of a given size 9
  • 10. Step 2: Propose • Propose step computes a proposed increment 𝛿 𝑗 for each 𝑗 ∈ 𝐽. – this step does not actually change the weights • In Step 2, we maintain a vector 𝝋 ∈ ℝ 𝑘 , where 𝝋 𝑗 is a proxy for the objective function evaluated at 𝒘 + 𝜹 𝑗 𝒆 𝑗 – update 𝝋 𝑗 whenever a new proposal is calculated for j – 𝝋 is not necessary if the algorithm will accepts all proposals 10
  • 11. Step 3: Accept • In Accept step, the algorithm accepts 𝐽′ ⊆ 𝐽 –  [Bradley+ 11] show correlations among features can lead to divergence if too many coordinates are updated at once (see below figure) • In CCD, SCD, Shotgun, the algorithm allows all proposals to be accepted – No need to calculate 𝝋 11
  • 12. Step 4: Update • In Update step, the algorithm updates according to the set 𝐽′ 𝑿𝒘 を保持 12
  • 13. Approximate Minimization (1/2) • Propose step calculates a proposed increment 𝜹 𝑗 for each 𝑗 ∈ 𝐽 𝛿 = argmin 𝛿 𝐹 𝒘 + 𝛿𝒆 𝑗 + 𝜆|𝒘 𝑗 + 𝛿| 1 𝑛 where, 𝐹 𝒘 = 𝑖=1 ℓ 𝒚 𝑖 , 𝑿𝒘 𝑖 𝑛 •  For a general loss function, there is no closed-form solution along a given coordinate. – Thus, consider approximate minimization 13
  • 14. Approximate Minimization (2/2) • Well known minimizer (e.g., [Yuan and Lin 10]) 𝛻𝑗 𝐹 𝒘 − 𝜆 𝛻𝑗 𝐹 𝒘 + 𝜆 𝛿 = −𝜓 𝒘𝑗; , 𝛽 𝛽 𝑎 if 𝑥 < 𝑎 where, 𝜓 𝑥; 𝑎, 𝑏 = 𝑏 if 𝑥 > 𝑏 𝑥 otherwise for squared loss 𝛽 = 1, logistic loss 𝛽 = 1/4. 14
  • 15. Step 2: Propose (Approximated) ′ 𝑖ℓ 𝒚 𝑖 ,𝒛 𝑖 ,𝑿 𝑗 ? 𝑛 Decrease in the approximated objective 15
  • 17. Algorithms (conventional) • SHOTGUN [Bradley+ 11] – Select step: random subset of the columns – Accept step: accepts every proposal • No need to compute a proxy for the objective – convergence is guaranteed only if the # of coordinates selected is at most 𝑃 ∗ = 𝑘 (*1) 2𝜌 • GREEDY – Select step: all coordinates – Propose step: each thread generating proposals for some subset of the coordinates using approximation – Accept step: Only accepts the single best among the all threads. (*1) 𝜌 is the matrix eigenvalue of 𝑿 𝑇 𝑿 17
  • 18. Comparisons of the Algorithms 18
  • 19. Algorithms (proposed) • THREAD-GREEDY – Select step: random set of coordinates (?) – Propose step: each thread generating proposals for some subset of the coordinates using approximation – Accept step: Each thread accepts the best of the proposals – no proof for convergence (however, empirical results are encouraging) • COLORING – Preprocessing: structurally independent features are identified via partial distance-2 coloring – Select step: a random color is selected – Accept step: accepts every proposal • since the features are disjoint. 19
  • 20. Implementation and Platform • Implementation – gcc with OpenMP • -O3 -fopenmp flags • parallel for pragma • static scheduling – Given n iterations and p threads, each thread gets n/p iterations • Platform – AMD Opteron (Magny-Cours) • with 48 cores (12 cores x 4 sockets) – 256GB Memory 20
  • 24. Summary • Presented GenCD, a generic framework for expressing parallel coordinate descent – Select, Propose, Accept, Upadte • Performs convergence and scalability tests for the four algorithms – but the authors do not favor any of these algorithms over the others • The condition for convergence of the THREAD- GREEDY algorithm is an open question 24
  • 25. References • [Yuan and Lin 10] G. Yuan, C. Lin, “A Comparison of Opitmization Methods and Software for Large-scale L1-regularized Linear Classification”, Journal of Machine Learning Research, vol.11, pp.3183-3234, 2010. • [Bradley+ 11] J. K. Bradley, A. Kyrola, D. Bickson, C. Guestrin, “Parallel Coordinate Descent for L1-Regularized Loss Minimization”, In Proc. ICML ‘11, 2011. 25
  • 26. おわり 26