6. Optimization Algorithms in Machine Learning
• Collaborative filtering
• K-means
• Maximum likelihood estimation
• Graphical models
• Neural Networks
• Deep Learning
6
7. Formulate Training as an Optimization Problem
• Training model: finding parameters that minimize some objective
function
Define Parameters
Define an Objective
Function
Find values for the parameters that
minimize the objective function
Cost term Regularization term
Optimization
Algorithm
7
10. Gradient Descent
• Step length
• Constant value
• Search direction
• Negative gradient
Small Step Length
10
11. Gradient Descent
• Step length
• Constant value
• Search direction
• Negative gradient
Large Step Length
11
12. Newton Methods
• Step length
• Use a line search
• Search direction
• Use Curative Information (Inverse of Hessian Matrix)
12
13. Quasi Newton Methods
• Problem with large n in Newton methods
• Calculation of inverse of Hessian matrix too expensive
• Continuously updating an approximation of the inverse of the Hessian
matrix in each iteration
13
14. BFGS
• Broyden, Fletcher, Goldfarb, and Shanno
• Most popular Quasi Newton Method
• Uses Wolfe line search to find step length
• Needs to keep n×n matrix in memory
14
15. L-BFGS
• Limited-memory: only a few vectors of length n (m×n instead of n×n)
• m << n
• Useful for solving large problems (large n)
• More stable learning
• Uses curvature information to take a more direct route
• faster convergence
15
16. How to use
• Define a function that calculates Objective value and Gradient
ObjectiveFunc (x, ObjectiveFunc_params, TrainData , TrainLabel)
16
17. Why L-BFGS?
• Toward Deep Learning
• Optimization is heart of DL and many other ML algorithms
• Popular
• Advantages over SGD
17
18. HPCC Systems
• Open source, massive parallel-processing computing platform for big
data processing and analytics
• LexisNexis Risk Solutions
• Uses commodity clusters of hardware running on top of the Linux
operating system
• Based on DataFlow programming model
• THOR-ROXIE
• ECL
18
20. DataFlow Analysis
• Main focus is how the data is being changed
• A Graph represents a transformation on the data
• Each node is an operation
• Edges show the data flow
20
21. A DataFlow example
21
Id value
1 2
1 3
2 5
1 10
3 4
2 9
Id value
1 2
1 3
1 10
2 5
2 9
3 4
Id value
1 10
2 9
3 4
MAX
22. ECL
• Enterprise Control Language
• Compiled into optimized C++ code
• Declarative Language provides parallel and distributed DataFlow
oriented processing
22
23. ECL
• Enterprise Control Language
• Compiled into optimized C++ code
• Declarative Language provides parallel and distributed DataFlow
oriented processing
23
24. Declarative
• What to accomplish, rather than How to accomplish
• You’re describing what you’re trying to achieve, without instructing
how to do it
24
25. ECL
• Enterprise Control Language
• Compiled into optimized C++ code
• Declarative Language provides parallel and distributed DataFlow
oriented processing
25
26. ECL
• Enterprise Control Language
• Compiled into optimized C++ code
• Declarative Language provides parallel and distributed DataFlow
oriented processing
26
29. 29
Id value
2 5
2 9
Node 1
Node 2
LOCAL SORT
Id value
1 2
1 3
1 10
3 4
Id value
2 5
2 9
Node 1
Node 2
READ
Id value
1 2
1 3
3 4
1 10
30. 30
Id value
2 5
2 9
Node 1
Node 2
LOCAL SORT
Id value
1 2
1 3
1 10
3 4
Id value
2 5
2 9
Node 1
Node 2
READ
Id value
1 2
1 3
3 4
1 10
Id value
1 2
1 3
1 10
3 4
Id value
2 5
2 9
Node 1
Node 2
LOCAL GROUP
31. 31
Id value
2 5
2 9
Node 1
Node 2
LOCAL SORT
Id value
1 2
1 3
1 10
3 4
Id value
2 5
2 9
Node 1
Node 2
READ
Id value
1 2
1 3
3 4
1 10
Id value
1 2
1 3
1 10
3 4
Id value
2 5
2 9
Node 1
Node 2
LOCAL GROUP
Id value
1 10
3 4
Id value
2 9
Node 1
Node 2
LOCAL AGG/MAX
32. Back to L-BFGS
• Minimize f(x)
• Start with an initialized x : x0
• Repeatedly update: xk+1 = xk + αkpk
32
Wolfe line search L-BFGS
33. • If x too large it does not fit in memory of one machine
• Needs m × n memory
• Distribute x on different machines
• Try to do computations locally
• Do global computations as necessary
33
34. • If x too large it does not fit in memory of one machine
• Needs m × n memory
• Distribute x on different machines
• Try to do computations locally
• Do global computations as necessary
34
35. • If x too large it does not fit in memory of one machine
• Needs m × n memory
• Distribute x on different machines
• Try to do computations locally
• Do global computations as necessary
35
. . .
36. • If x too large it does not fit in memory of one machine
• Needs m × n memory
• Distribute x on different machines
• Try to do computations locally
• Do global computations as necessary
36
. . .
. . .
56. Formulate to an optimization problem
• Parameters
• K × f variables
• Objective function
• Generalize logistic regression objective function
• Define a function to calculate objective value and Gradient at a give
point
56
72. If both matrices big
72
f1
f
m
K
f2 f3
f1
f2
f3
K×m K×m K×m ROLLUP
73. Sparse Autoencoder
• Autoencoder
• Output is the same as the input
• Sparsity
• constraint the hidden neurons to be inactive most of the time
• Stacking them up makes a Deep Network
73
74. Formulate to an optimization problem
• Parameters
• Weight and bias values
• Objective function
• Difference between output and expected output
• Penalty term to impose sparsity
• Define a function to calculate objective value and Gradient at a give
point
74
77. Toward Deep Learning
• Provide learned features from one layer to another sparse
autoencoder
• …. Stack up to build a deep network
• Fine tuning
• Using forward propagation to calculate cost value and back propagation to
calculate gradients
• Use L-BFGS to fine tune
77
78. SUMMARY
• HPCC Systems allows implementation of Large-scale ML algorithms
• Optimization Algorithms an important aspect for advanced machine
learning problems
• L-BFGS implemented on HPCC Systems
• SoftMax
• Sparse Autoencoder
• Implement other algorithms by calculating objective value and gradient
• Toward deep learning
78
79. • HPCC Systems
• https://hpccsystems.com/
• ECL-ML Library
• https://github.com/hpcc-systems/ecl-ml
• My GitHub
• https://github.com/maryamregister
• My Email
• mmousaarabna2013@fau.edu
79