1. Machine learning:
Keywords + Applications
1) Applications of machine learning
- wind power forecasting (important e.g. for PengHu island!)
- rainfalls estimation
2) Some key words (you must know what they mean):
- black box / white box
- shrinking horizon
- objective function
- “what you get is what you have”
- model complexity
- cross-validation
- generative model
- quantile, value-at-risk
2. What you will see
in these slides
1) Applications of machine learning
- wind power forecasting (important e.g. for PengHu island!)
- rainfalls estimation
2) Some key words (you must know what they mean):
- black box / white box
- shrinking horizon
- objective function
- “what you get is what you have”
- model complexity
- cross-validation
- generative model
- quantile, value-at-risk
4. I want to produce
electricity
I have:
- water for hydroelectricity
- a nuclear power plant
- wind farms
- gas turbines
5. I want to produce
electricity
I must ensure, for each time step:
Production of electricity
=
Demand of electricity
Demand(t0), Demand(t1), Demand(t2), Demand(t3) known.
6. I want to produce
electricity
We get four equations:
Production(t0) = Demand(t0)
Production(t1) = Demand(t1)
Production(t2) = Demandt(2)
Production(t3) = Demand(t3)
Other equation:
Production = hydro-production + nuclear-production
+ wind-farm production + gas production
7. I want to produce
electricity
We get four equations:
H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)
H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)
H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)
H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)
Stock level for Hydro depends on production
x(1) = x(0)-H(0) x(2) = x(1)-H(1)
x(3) = x(2)-H(2) x(4) = x(3)-H(3)
8. Also depends on inflows
We get four equations:
H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)
H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)
H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)
H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)
Stock level for Hydro: x(0); constraint: x(i) >= 0
x(1) = x(0)+I(0)-H(0) x(2) = x(1)+I(1)-H(1)
x(3) = x(2)+I(2)-H(2) x(4) = x(3)+I(3)-H(3)
10. 8 equations
(yes, it increases...)
Nuclear has constraints as well:
- N(1) in f(N(0))
- N(2) in f(N(1))
- N(3) in f(N(2))
(very simplified; in fact there are stocks, refills...)
11. Ok! Summary ?
W(0), W(1), W(2), W(3) wind farms production
= can not be chosen and
W(1), W(2), W(3) unknown!
To be chosen:
G(0), G(1), G(2), G(3) gas turbines production
H(0), H(1), H(2), H(3) hydroelectric production
(can be somehow negative)
N(0), N(1), N(2), N(3) nuclear power
12. Ok! Summary ?
To be chosen:
G(0), G(1), G(2), G(3) gas turbines production
H(0), H(1), H(2), H(3) hydroelectric production
(can be somehow negative)
N(0), N(1), N(2), N(3) nuclear power
Constraints: production plans must satisfy constraints.
E.g.: if unlimited gas turbines production, we might decide
G(0)=demand(0)-W(0), G(1)=demand(1)-W(1),
G(2)=demand(2)-W(2), G(3)=demand(3)-W(3)
==> it is a feasible solution
13. Ok! Summary ?
To be chosen:
G(0), G(1), G(2), G(3) gas production
H(0), H(1), H(2), H(3) hydroelectric production
(can be somehow negative)
N(0), N(1), N(2), N(3) nuclear power
Constraints: production plans must satisfy constraints.
E.g.: if unlimited gas production, we might decide
G(0)=demand(0)-W(0), G(1)=demand(1)-W(1),
G(2)=demand(2)-W(2), G(3)=demand(3)-W(3)
==> it is a feasible solution
==> it is a bad feasible solution
14. Ok! Summary ?
To be chosen:
G(0), G(1), G(2), G(3) gas production
H(0), H(1), H(2), H(3) hydroelectric production
(can be somehow negative)
N(0), N(1), N(2), N(3) nuclear power
Constraints: production plans must satisfy constraints.
E.g.: if unlimited gas production, we might decide
G(0)=demand(0)-W(0), G(1)=demand(1)-W(1),
G(2)=demand(2)-W(2), G(3)=demand(3)-W(3)
==> it is a feasible solution
==> it is a bad feasible solution
Objective function: not all solutions are equivalent!
15. Ok! Summary ?
Production cost:
Hcost * (H0+H1+H2+H3)
+ Ncost * (N0+N1+N2+N3)
+ Gcost * (G0+G1+G2+G3)
+ Wcost* (W0+W1+W2+W3)
Nb: Cost does not only mean $.
Cost means ecological & environmental costs as well.
16. Quizz !
So we have:
x0,x1,x2,x3: states at time t0, t1, t2, t3.
x0 is given, x1, x2, x3 depend on our decisions.
Some decisions are chosen at time t0.
Some decisions are chosen at time t1.
Some decisions are chosen at time t2.
Some decisions are chosen at time t3.
The cost depends on all decisions.
Is this a supervised learning problem ?
Is this a reinforcement learning problem ?
Is this a boring problem ?
17. Ok! Summary ?
So we have equations.
If we know W(1),W(2),W(3),
we can evaluate the production cost.
We want to:
- solve equations
- minimize production cost
Problem: we don't know W(1), W(2), W(3).
How to know ?
18. Ok! Summary ?
We want to know W(1), W(2), W(3).
Steps:
(1) Weather simulation: we predict the wind
at time steps t1 t2 t3 (as in classical
weather forecast)
(2) From the wind forecast,
predict the power (e.g. “black box” model):
Based on data
E.g. mean-square error
Predicting W(1), W(2), W(3):
Boring problem ?
Supervised learning problem ?
Reinforcement learning problem ?
19. Ok! Summary ?
We want to know W(1), W(2), W(3).
Steps:
(1) Weather simulation: we predict What does
the wind
at time steps t1 t2 t3 (as in classical box”
“black
weather forecast) mean ?
(2) From the wind forecast,
predict the power (e.g. “black box” model):
Based on data
E.g. mean-square error
Predicting W(1), W(2), W(3):
Boring problem ?
Supervised learning problem ?
Reinforcement learning problem ?
20. Difficulties ?
In many cases, you will see in your life as an engineer that:
- collecting datas and models is a big
part of the work
- solving the problem exactly is impossible
- what really matters in an application is to
find where the current codes are
not satisfactory, and not to spend time on
other aspects
21. Typical questions for
this application
Many constraints /
effects
are missing !
(for the real
application,
we must have far
more
constraints...)
22. Typical questions for
this application
Many constraints /
effects
are missing !
Mean square (for the real
error in the application,
supervised we must have far
learning for more
W1,W2,W3 ? constraints...)
But ..........
................
.................
23. Typical questions for
this application
Many constraints /
effects
are missing !
(for the real How many time
Mean square
application, steps in the future
error in the
we must have far should
supervised
more we consider ?
learning for
W1,W2,W3 ? constraints...)
But ..........
................
.................
24. Typical questions for
this application
Many constraints /
effects
are missing !
(for the real How many time
Mean square
application, steps in the future
error in the
we must have far should
supervised
more we consider ?
learning for
W1,W2,W3 ? constraints...)
But ..........
................
.................
We should
penalize
cases with W4
small !
25. Typical questions for
this application
Many constraints /
effects
are missing !
(for the real How many time
Mean square
application, steps in the future
error in the
we must have far should
supervised
more we consider ?
learning for
W1,W2,W3 ? constraints...)
But ..........
................ In case of long
................. term:
should we
We should consider
“climate change”
penalize
cases with W4 bias ?
small !
26. Some of these points
Typical questions for are important, some
are negligible,
this application depending on the
system
Many constraints / under analysis.
effects
are missing !
(for the real How many time
Mean square
application, steps in the future
error in the
we must have far should
supervised
more we consider ?
learning for
W1,W2,W3 ? constraints...)
But ..........
................ In case of long
................. term:
should we
We should consider
“climate change”
penalize
cases with W4 bias ?
small !
29. Another beautiful application
This is Paris.
Beautiful town.
With plenty of people
(10 millions in IDF).
Producing plenty of fecal
matter ==> dirty water.
30. Our river in Paris
is the “Seine”.
A French
politician said
he would soon
swim across it.
After all, he never
did it.
For your health,
don't do it.
Nevertheless,
we try
to keep it
as clean
as possible.
31. Dirty water should be separated from the Seine.
And usually it is.
Something like this:
Seine
Dirty
water
32. Problem: if big rainfalls reach dirty water,
then dirty water might pollute the Seine
Seine
Dirty
water
33. No typhoon in France.
But we can have heavy rains/winds in Paris:
- 0.96 dm in 24 hours happened in 1987.
- gusts at 169 km/h in 1999 (very unusual in France)
Problem: if big rainfalls reach dirty water,
then dirty water might pollute the Seine
Seine
Dirty
water
(yes, in Taiwan it is more impressive,
sometimes it is 16.7 dm in 24 hours and gusts
can reach 250 km/h...)
34. No typhoon in France.
But we can have heavy rains/winds in Paris:
- 0.96 dm in 24 hours happened in 1987.
- gusts at 169 km/h in 1999 (very unusual in France)
Problem: if big rainfalls reach dirty water,
then dirty water might pollute the Seine
Seine
Dirty
water
(yes, in Taiwan it is more impressive,
sometimes it is 16.7 dm in 24 hours and gusts
can reach 250 km/h...)
35. No typhoon in France.
But we can have heavy rains/winds in Paris:
- 0.96 dm in 24 hours happened in 1987.
- gusts at 169 km/h in 1999 (very unusual in France)
Problem: if big rainfalls reach dirty water,
then dirty water might pollute the Seine
Seine
Dirty
water
→ Seine!(yes, in Taiwan it is more impressive,
sometimes it is 16.7 dm in 24 hours and gusts
can reach 250 km/h...)
36. Another beautiful application
Three water networks:
- dirty water: should go to cleaning stations
- clean water: can go to the Seine, but can't be drunk
- drinkable water (France: tap water = drinkable)
37. Big water network
Dirty Dirty Dirty Dirty
water water water water
Clean Clean Clean Clean
water water water water
38. Water vs dirty water
Challenge:
Summer storms.
Not comparable to a Taiwanese typhoon.
But a lot of water.
Can make dirty water become very big.
Can invade clean water.
Your mission:
- Get read of dirty water
- Protect clean water
39. Water vs dirty water
State: level in each stock,
valves' status
(open or closed)
At each time step,
rainfalls(i) liters of water reach stock i.
you can open or close valves
==> get a new state.
Your mission:
- Get read of dirty water
- Protect clean water
40. Water vs dirty water
Typically:
(0, 1, 0, 0, 0, 1, 0, 1, 0.42, 0.2, 0.0, 0.8, 0.3)
(valves) (stock levels)
Plenty of rules:
- if (valve 4 opens, then water from stock 1
goes to stock 2 at rate 0.02m3/s)
- if (stock[2]>0.3) then dirty water ==> Seine,
3
0.1m /s
==> Miminize the quantity of dirty water in clean
stocks at the end of the storm
41. Water vs dirty water
D-dimensional
vector
Equations:
Stocks(t+1) = complicatedFunction(Stocks(t),
rainfalls(t), valves(t))
D-dimensional
vector
(D=number of stocks) D-dimensional V-dimensional
vector vector
(D=number of stocks) (V=number of valves)
42. Water vs dirty water
To be decided:
valves(t) for each t
If there are 240 times steps,
we get 240 x V decision
V-dimensional
vector
variables
(V=number of valves)
Criterion = objective function = quantity of dirty
water reaching the clean network + quantity of
dirty water in the river
43. Shrinking horizon
Too many time steps!
At each time step, make a decision
using only 30 time steps.
Move this window of 30 time steps.
44. Shrinking horizon
Too many time steps!
At each time step, make a decision
using only 30 time steps.
Move this window of 30 time steps.
45. Shrinking horizon
Too many time steps!
At each time step, make a decision
using only 30 time steps.
Move this window of 30 time steps.
46. Shrinking horizon
Too many time steps!
At each time step, make a decision
using only 30 time steps.
Move this window of 30 time steps.
48. Summary ?
Is this:
- an optimization problem ?
- a reinforcement learning problem ?
- a supervised machine learning problem ?
49. Summary ?
Is this:
- an optimization problem ?
- a reinforcement learning problem ?
- a supervised machine learning problem ?
Problem: rainfalls are unknown.
50. How to predict rainfalls ?
In fact, there are distinct rainfalls:
- R1: a spatial distribution of rainfalls
(one number per time step
per point of the map)
- R2:
a underground list of rainfall arrivals (inflows),
per stocks (D-dimensional)
Input data:
- weather forecast of archive ( R1(t) for each t)
- archives of weather forecast R1(t)
- archives of inflows R2(t)
51. If your life was depending on it, what
would you do ?
52. If your life was depending on it, what
would you do ?
We are at time t.
We need a forecaster:
- which takes available data as input
- and outputs R2(t') for t'>=t (why not for t' < t ?)
53. If your life was depending on it, what
would you do ?
We are at time t.
We need a forecaster:
- which takes available data as input
- and outputs R2(t') for t'>=t (why not for t' < t ?)
(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))
=?
54. If your life was depending on it, what
would you do ?
We are at time t.
We need a forecaster:
- which takes available data as input
- and outputs R2(t') for t'>=t (why not for t' < t ?)
(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))
= f( R1(t) ) ?
55. If your life was depending on it, what
would you do ?
We are at time t.
We need a forecaster:
- which takes available data as input
- and outputs R2(t') for t'>=t (why not for t' < t ?)
(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50) )
(because there are delays)
56. If your life was depending on it, what
would you do ?
We are at time t.
We need a forecaster:
- which takes available data as input
- and outputs R2(t') for t'>=t (why not for t' < t ?)
(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
(because “what you get is what you have”)
57. If your life was depending on it, what
would you do ?
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
and then agregation:
= f( R1(t),
R1(t-1)+R1(t-2),
R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6),
+...,
R2(t) )
Why ?
58. If your life was depending on it, what
would you do ?
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
and then agregation:
= f( R1(t),
R1(t-1)+R1(t-2),
R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6),
+...,
R2(t) )
Because less parameters.
59. If your life was depending on it, what
would you do ?
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
and then agregation:
= f( R1(t),
R1(t-1)+R1(t-2),
R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6),
+...,
R2(t) )
Because less parameters.
Rule of thumb: number of parameters
less than number of data points / 20 <=== why ?
60. If your life was depending on it, what
would you do ?
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
and then agregation:
= f( R1(t),
R1(t-1)+R1(t-2),
R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6),
+...,
R2(t) )
Because less parameters.
Rule of thumb: number of parameters
less than number of data points / 20 <=== why ?
How to choose between all these models ?
61. If your life was depending on it, what
would you do ?
= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )
and then agregation:
= f( R1(t),
R1(t-1)+R1(t-2),
R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6),
+...,
R2(t) )
Because less parameters.
Rule of thumb: number of parameters
less than number of data points / 20 <=== why ?
How to choose between all these models ? Cross-validation.
62. Main weakness of this analysis ?
The same as in the previous application.
We predicted R2(t), R2(t+1), ....
Then we maximize cleanness based on these forecasts.
But there are huge uncertainties.
63. Main weakness of this analysis ?
This is often done in real world.
No change on the
Often, we do not spend time optimization algorithm
on checking that the consequences
are minor. (we are just pessimistic
in the forecasts)
“Cheap” solutions (do not take too much time):
- predicting a quantile (do you know how ?)
instead of a conditional expectation
and check on simulations
- predicting a conditional expectation +
moments (do you know how ?)
Then, optimize on average
(slight change in the objective function)
64. What about an exact solution ?
The exact solution is much harder to implement.
We can use forecasts with moments.
Then, we get a MDP.
Then, this is reinforcement learning.
- simple: forecasting + optimizing
- a bit more complex: pessimistic forecasting + optimizing
- more complex: forecasting with moments + optimizing on average
or optimizing a quantile (“value at risk”)
- advanced: full reinforcement learning model
65. What about an exact solution ?
The best choice depends on the precision of your model,
the budget you have.
Some problems involve billions of US $ and have precise models.
Then, each percent of improvement represents more money than
all your professional life. Then, you can (must)
implement something very advanced.
Sometimes, model are very imprecise.
Then, optimizing at 0.001% is meaningless. Improving the model
is more important.
- simple: forecasting + optimizing
- a bit more complex: pessimistic forecasting + optimizing
- more complex: forecasting with moments + optimizing on average
or optimizing a quantile (“value at risk”)
- advanced: full reinforcement learning model
66. What do you think ?
Did you understand ?
1) Applications of machine learning
- wind power forecasting (important e.g. for PengHu island!)
- rainfalls estimation
2) Some key words (you must know what they mean):
- black box / white box
- shrinking horizon
- objective function
- “what you get is what you have”
- model complexity
- cross-validation
- generative model
- quantile, value-at-risk
===> olivier.teytaud@inria.fr