24 травня відбувся GlobalLogic Machine Learning Webinar “Statistical learning of linear regression model” від спікера Віталія Мірошніченка.
Під час вебінару ми обговорили такі теми:
- Модель лінійної регресії;
- Підгонка параметрів моделі (custom, sklearn, scipy);
- Основні теореми та асимптотика параметрів;
- Дискриптивні статистики (візуалізація результатів);
- Тести та їх інтерпретація;
- Приклади з Machine Learning.
Відео та деталі заходу - https://www.globallogic.com/ua/about/events/statistical-learning-of-linear-regression-model/?utm_source=youtube-organic&utm_medium=social&utm_campaign=statistical-learning-of-linear-regression-model
Попередня реєстрація на GL BaseCamp - https://bit.ly/BaseCampwaitinglist
7. 7
Formulas…
Data model
b b b
X - features (regressors) | Matrix
(d, n)
b - unknown parameters | Vector
(d, 1)
Y - target (answer) | Vector
(1, n)
epsilon - error
10. 10
Heuristics
Problem
b
b
b - line between first and last points. Data is
ordered by Y
b - line between first and last points. Data is
ordered by X
What estimator is the “best”?
What does word “best” mean?
11. 11
Heuristics
Problem
b
b
b - line between first and last points. Data is
ordered by Y
b - line between first and last points. Data is
ordered by X
What estimator is the “best”?
What does word “best” mean?
What about outliers?
12. 12
Heuristics
Problem
b - line between first and last points. Data is
ordered by Y
b - line between first and last points. Data is
ordered by X
What estimator is the “best”?
What does word “best” mean?
What about outliers?
What about extrapolation?
b
b
13. 13
What does the word “best” mean
Problem
B - is a set of all possible parameters of b
The “best” estimator should be the best estimator
among other estimators.
What is the measure (loss)?
21. 21
Useless features
Normal equation with linear transformation
(MSE case)
new features,
det(H) > 0
Here H could mean:
1. multiplication on scalar
2. column swap
3. adding column
4. Any combinations of (1, 2, 3)
22. 22
Useless features
Adding new features …
(MSE case)
Linear combined features are useless. Easy to show, det(XX^T) = 0 in this case
23. 23
Useful features
Adding new features …
(MSE case)
X is a feature matrix
What features will help to upgrade the model:
- Polynoms(X)
- Nonlinear function(X): log, exp, sin, cos,
tanh, sigmoid, relu, selu ...
- Indicators (x_1 > 0.6)
36. 36
Tests
Linear F-test (hard way)
Two models:
Steps:
- Count hat F
- specify alpha (1-st type error) level
- Find quantile of Fisher distribution in tables
- Compare F and hat F
link
41. 41
Regularization
Ridge (L2 regularization, Tikhonov)
Ridge regression can work with correlated features:
- You can add as many features as you can
- Please select lambda somehow…
- Do not forget to normalize coefficients
42. 42
Tests
L1 regularization
link
You can use use:
1. Convex optimization methods
1. Quadratic programming methods
1. From sklearn.linear_model import Lasso :)
to solve this minimization problem
44. 44
Multiple output
What if Y is a matrix (output is a vector)?
X - features (regressors) | Matrix
(d, n)
b - unknown parameters | Vector
(d, m)
Y - target (answer) | Vector
(m, n)
epsilon - error
| Vector (m, n)
47. 47
Bibliography
1. Linear regression link
1. Gauss-Markov theorem link
1. scikit-learn Linear regression link
1. Correlation matrix link
1. Regularizations link
1. Fisher test link