GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model”

2
2
Statistical learning of linear regression model

3
Agenda
1. Data model
2. Problem
3. Fit
4. Residuals
5. Estimator properties
6. Visualizations
7. Fisher tests
8. Regularization
9. Multiple output

5
Real problems
Data model
Retail Computer vision

7
Formulas…
Data model
b b b
X - features (regressors) | Matrix
(d, n)
b - unknown parameters | Vector
(d, 1)
Y - target (answer) | Vector
(1, n)
epsilon - error

8
How estimate unknown parameter?
Data model
b = ?
b b b

9
9
Linear Regression | problem

10
Heuristics
Problem
b
b
b - line between first and last points. Data is
ordered by Y
ordered by X
What estimator is the “best”?
What does word “best” mean?

11
Heuristics
Problem
b
b
ordered by Y
ordered by X
What about outliers?

12
Heuristics
Problem
ordered by Y
ordered by X
What about outliers?
What about extrapolation?
b
b

13
What does the word “best” mean
Problem
B - is a set of all possible parameters of b
The “best” estimator should be the best estimator
among other estimators.
What is the measure (loss)?

14
Loss function examples
Problem

16
Fit
Normal equation for linear regression
We do not use any Loss here…

17
Fit
Normal equation for linear regression (MSE case)
sklearn.linear_model.LinearRegression

18
Fit
Other estimates (MAE and scipy.optimize.fmin)
*The same logic to minimize other loss functions

19
Fit
Other estimates (scipy.optimize.fsolve)
*The same logic to minimize other derivatives

20
20
Linear regression | Useless feature engineering

21
Useless features
Normal equation with linear transformation
(MSE case)
new features,
det(H) > 0
Here H could mean:
1. multiplication on scalar
2. column swap
3. adding column
4. Any combinations of (1, 2, 3)

22
Useless features
Adding new features …
(MSE case)
Linear combined features are useless. Easy to show, det(XX^T) = 0 in this case

23
Useful features
Adding new features …
(MSE case)
X is a feature matrix
What features will help to upgrade the model:
- Polynoms(X)
- Nonlinear function(X): log, exp, sin, cos,
tanh, sigmoid, relu, selu ...
- Indicators (x_1 > 0.6)

24
24
Linear regression | Asymptotics

25
Properties
Properties: consistency

26
Properties
Properties: consistency visualization

27
Properties
Properties: Normality
link

28
Properties
Properties: Normality visualization

29
Properties
Properties: Gauss-Markov theorem
If:
Then, is “BLUE” (optimal) - best linear
unbiased estimator
Link (G.M. theorem and BLUE definition)

30
30
Linear regression | MSE, Residuals

31
Residuals
Linear model prediction residuals
Last formula means:
MSE estimator minimize residuals variance
MSE is a residual variance estimator

32
32
Linear regression | Visualizations

33
Visualization
Simple visualizations

34
Visualization
Q-Q diagram
q
a1
q
a2

35
35
Linear regression | Feature selection

36
Tests
Linear F-test (hard way)
Two models:
Steps:
- Count hat F
- specify alpha (1-st type error) level
- Find quantile of Fisher distribution in tables
- Compare F and hat F
link

37
Tests
Linear F-test (easy way)
You can evaluate this: but this simpler
link

38
Tests
Linear F-test
(example)
link
Without “F-test”
MSE: 0.13890
Rank: 1316 / 4368
With “F-test”
MSE: 0.13501
RANK: 1681 / 4368
Features: only float and int features.
Feature selection with F-test: +365 rank

39
39
Linear regression | Regularization

40
Regularization
Ridge (L2 regularization, Tikhonov)
The second term is a penalty for extremely big parameter values
lambda - is not fittable parameter
link

41
Regularization
Ridge (L2 regularization, Tikhonov)
Ridge regression can work with correlated features:
- You can add as many features as you can
- Please select lambda somehow…
- Do not forget to normalize coefficients

42
Tests
L1 regularization
link
You can use use:
1. Convex optimization methods
1. Quadratic programming methods
1. From sklearn.linear_model import Lasso :)
to solve this minimization problem

43
43
Linear regression | Multiple output

44
Multiple output
What if Y is a matrix (output is a vector)?
X - features (regressors) | Matrix
(d, n)
b - unknown parameters | Vector
(d, m)
Y - target (answer) | Vector
(m, n)
epsilon - error
| Vector (m, n)

45
Multiple output
Normal equation for linear regression
The slide is the same …

46
46
Linear regression | Links

47
Bibliography
1. Linear regression link
1. Gauss-Markov theorem link
1. scikit-learn Linear regression link
1. Correlation matrix link
1. Regularizations link
1. Fisher test link

GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model”

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model”

Semelhante a GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model” (20)

Mais de GlobalLogic Ukraine

Mais de GlobalLogic Ukraine (20)

Último

Último (20)

GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model”