Strategies for Landing an Oracle DBA Job as a Fresher
Learning spline-based curve models (Laure Amate)
1. Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model Learning spline-based curves models
Goal: “simple”
representation
Collective spline
modeling
Pb statement Laure Amate
Criterion
EM approaches
MISTIS(INRIA-LJK Grenoble)& LIG
Some definitions
Monte-Carlo online
EM
Results S´minaire BigMC – 27 mai 2010
e
Conclusion
1 / 45
2. Overview
Learning spline-based
curves models
L. Amate
1 Some definitions
Goal: learning curves model
Some definitions
Goal: learning curves Goal: “simple” representation
model
Goal: “simple”
representation
2 Collective spline modeling
Collective spline
modeling problem statement
Pb statement
Criterion Criterion
EM approaches
Some definitions 3 EM approaches
Monte-Carlo online
EM Some definitions
Results
Monte-Carlo online EM
Conclusion
4 Results
5 Conclusion
2 / 45
3. Concept of class for curves
Learning spline-based
curves models learning a model from available objects
L. Amate
Some definitions
Goal: learning curves
model
Goal: “simple”
representation
Collective spline
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
3 / 45
4. Concept of class for curves
Learning spline-based
curves models learning a model from available objects
L. Amate
Some definitions
Goal: learning curves
model
Goal: “simple”
representation
Collective spline
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM Characterizing a group
Results
C = {cj (t)}M , set of contours
j=1
Conclusion
probabilistic approach : cj ∼ p(c), unknown
determination of an estimate p (c)
ˆ
4 / 45
5. ”simple” representation
Learning spline-based
curves models
L. Amate
Some definitions sampling
Goal: learning curves
model
Goal: “simple”
segments + arcs
representation
Collective spline
ellipsoids
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
5 / 45
6. ”simple” representation
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model
Goal: “simple”
representation
Spline curves
Collective spline
modeling adaptivity to the data
Pb statement
Criterion
sparse representations (a few parameters)
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
6 / 45
7. ”simple” representation
Learning spline-based
curves models Spline curves
L. Amate
adaptivity to the data
Some definitions
Goal: learning curves
sparse representations (a few parameters)
model
Goal: “simple” 4
representation
3.5
Collective spline 3
modeling
2.5
Pb statement
Criterion piecewise continuous 2
EM approaches polynomials of order m 1.5
Some definitions 1
Monte-Carlo online
EM s(t) : [0, 1] → R2 0.5
0
Results 0 0.2 0.4 0.6 0.8 1
Conclusion knots (limits of pieces)
k
∀ξ ∃ B-spline basis {bim (t; ξ)}k :
i=1 s(t) = βi bim (t; ξ)
i=1
7 / 45
8. ”simple” representation
Learning spline-based
curves models
L. Amate
Spline curves
Some definitions
Goal: learning curves adaptivity to the data
model
Goal: “simple”
representation sparse representations (a few parameters)
Collective spline
modeling
Pb statement
Criterion
ξ ↔ Mk probabilistic simplex
EM approaches βi ∈ R2 ↔ C ⇒ β1:k ∈ Ck
Some definitions
Monte-Carlo online
EM
θ = (k, β1:k , ξ1:k ) ∈ K × Ck × Mk
Results Θk
Conclusion
s(ti )N
i=1 → θ
2N → 3k + 1
8 / 45
9. ”simple” representation
Learning spline-based
curves models
Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model 1) Quality with k
Goal: “simple”
representation
Collective spline
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
9 / 45
10. ”simple” representation
Learning spline-based
curves models
Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model 1) Quality with k
Goal: “simple”
representation
Collective spline
modeling
Pb statement 200
Criterion
EM approaches
150
Some definitions
Monte-Carlo online
EM
100
Results
Spline subspace
Conclusion 50
of dimension 10
0
50 100 150 200 250 300
10 / 45
11. ”simple” representation
Learning spline-based
curves models
Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model 1) Quality with k
Goal: “simple”
representation
Collective spline
modeling
200
Pb statement
Criterion
EM approaches 150
Some definitions
Monte-Carlo online
EM 100
Results
Spline subspace
Conclusion 50
of dimension 25
Uniform knots 0
50 100 150 200 250 300
11 / 45
12. ”simple” representation
Learning spline-based
curves models Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model
1) Quality with k
Goal: “simple”
representation
Collective spline
modeling 200
Pb statement
Criterion
150
EM approaches
Some definitions
Monte-Carlo online 100
EM
Results Spline subspace
50
Conclusion of dimension 25
Uniform knots 0
50 100 150 200 250 300
⇒ we need to adapt k to the complexity of c(t)
to capture the relevant morphological features of c(t) 12 / 45
13. ”simple” representation
Learning spline-based
curves models
Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model 1) Quality with k
Goal: “simple”
representation
2) Quality with well-chosen ξ
Collective spline
modeling
200
Pb statement
Criterion
EM approaches 150
Some definitions
Monte-Carlo online
EM 100
Results
Spline subspace
Conclusion 50
of dimension 25
Uniform knots 0
50 100 150 200 250 300
13 / 45
14. ”simple” representation
Learning spline-based
curves models
Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model 1) Quality with k
Goal: “simple”
representation
2) Quality with well-chosen ξ
Collective spline
modeling
200
Pb statement
Criterion
EM approaches 150
Some definitions
Monte-Carlo online
EM
100
Results
Spline subspace
Conclusion 50
of dimension 25
Free-knots 0
50 100 150 200 250 300
14 / 45
15. ”simple” representation
Learning spline-based
curves models Choice of ξ
L. Amate
c(t) is not a spline → approximative representation
Some definitions
Goal: learning curves
model
1) Quality with k
Goal: “simple”
representation 2) Quality with well-chosen ξ
Collective spline
modeling 200
Pb statement
Criterion
150
EM approaches
Some definitions
Monte-Carlo online
EM 100
Results Spline subspace
50
Conclusion of dimension 25
Free-knots 0
50 100 150 200 250 300
⇒ we need to adapt ξ to c(t) (for same k)
15 / 45
16. ”simple” representation
Learning spline-based
curves models
L. Amate
Representation space = varying complexity free-knots
Some definitions
Goal: learning curves splines space
model
Goal: “simple” k
representation
Collective spline s(t) = βi bim (t; ξ)
modeling
Pb statement i=1
Criterion
EM approaches Θ= k∈K Θk
Some definitions
Monte-Carlo online
EM → Θ is not a vector space
Results → Nested models family
Conclusion
· · · ⊂ Sk1 ⊂ Sk1 +1 ⊂ Sk1 +2 ⊂ · · ·
Sk , family of free-knots splines models with fixed k
16 / 45
17. Overview
Learning spline-based
curves models
L. Amate
1 Some definitions
Goal: learning curves model
Some definitions
Goal: learning curves Goal: “simple” representation
model
Goal: “simple”
representation
2 Collective spline modeling
Collective spline
modeling problem statement
Pb statement
Criterion Criterion
EM approaches
Some definitions 3 EM approaches
Monte-Carlo online
EM Some definitions
Results
Monte-Carlo online EM
Conclusion
4 Results
5 Conclusion
17 / 45
18. Collective spline modeling
Learning spline-based
curves models
L. Amate
Characterizing a group
Some definitions
C = {cj (t)}M , set of contours
j=1
Goal: learning curves
model probabilistic approach : cj ∼ p(c), unknown
Goal: “simple”
representation
determination of an estimate p (c)
ˆ
Collective spline
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
18 / 45
19. Collective spline modeling
Learning spline-based
curves models
L. Amate
Characterizing a group
Some definitions
C = {cj (t)}M , set of contours
j=1
Goal: learning curves
model probabilistic approach : cj ∼ p(c), unknown
Goal: “simple”
representation
determination of an estimate p (c)
ˆ
Collective spline
modeling
Pb statement
Criterion c(t) = s(t) + ε =⇒ c|θ ∼ N (s, σ 2 I)
EM approaches
Some definitions
Monte-Carlo online p(c) = p(c|θ)p(θ)dθ
EM
Θ
Results
Conclusion
19 / 45
20. Collective spline modeling
Learning spline-based
curves models
L. Amate
Characterizing a group
Some definitions
C = {cj (t)}M , set of contours
j=1
Goal: learning curves
model probabilistic approach : cj ∼ p(c), unknown
Goal: “simple”
representation
determination of an estimate p (c)
ˆ
Collective spline
modeling
Pb statement
Criterion c(t) = s(t) + ε =⇒ c|θ ∼ N (s, σ 2 I)
EM approaches
Some definitions
Monte-Carlo online p(c) = p(c|θ)p(θ)dθ
EM
Θ
Results
Conclusion k fixed
Parametric model: p(θ) = p(θ|γ)
βj |ξj , σ 2 ∼ N (µ0 , Σ(ξj , σ 2 ))
⇒ γ = (µ0 , α, σ 2 )
ξj ∼ Dir (α)
20 / 45
21. Collective spline modeling
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves Model structure
model
Goal: “simple”
representation α ξj
Collective spline
modeling
µ0 βj sj cj
Pb statement
Criterion σ2
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
21 / 45
22. Collective spline modeling
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves Model structure
model
Goal: “simple”
representation α ξj
Collective spline
modeling
µ0 βj sj cj
Pb statement
Criterion σ2
EM approaches
Some definitions
Monte-Carlo online
EM Problem
From {cj }M , estimating γ
Results
j=1 ˆ
Conclusion
22 / 45
23. Collective spline modeling
Learning spline-based
curves models
L. Amate
Problem
Some definitions
From {cj }M , estimating γ
j=1 ˆ
Goal: learning curves
model
Goal: “simple” M
representation
ˆ
1) ”Decoupled” approach:{cj }M → θj →γ
ˆ
Collective spline j=1 j=1
modeling
Pb statement
Criterion
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
23 / 45
24. Collective spline modeling
Learning spline-based
curves models
L. Amate
Problem
Some definitions
From {cj }M , estimating γ
j=1 ˆ
Goal: learning curves
model
Goal: “simple” M
representation
ˆ
1) ”Decoupled” approach:{cj }M → θj →γ
ˆ
Collective spline j=1 j=1
modeling
Pb statement ˆ ˆ
cj → (βj , ξj ) non linear estimation pb
Criterion
EM approaches
=⇒ MCMC methods (Metropolis-Hastings)
Some definitions M
Monte-Carlo online
EM ˆ ˆ
γ = arg max p( θj |γ)
γ∈G j=1
Results
Conclusion
ˆ
In general, θ not sufficient statistics →
information loss
24 / 45
25. Collective spline modeling
Learning spline-based
curves models
L. Amate
Problem
Some definitions
From {cj }M , estimating γ
j=1 ˆ
Goal: learning curves
model
Goal: “simple” M
representation
ˆ
1) ”Decoupled” approach:{cj }M → θj →γ
ˆ
Collective spline j=1 j=1
modeling
Pb statement ˆ ˆ
cj → (βj , ξj ) non linear estimation pb
Criterion
EM approaches
=⇒ MCMC methods (Metropolis-Hastings)
Some definitions M
Monte-Carlo online
EM ˆ ˆ
γ = arg max p( θj |γ)
γ∈G j=1
Results
Conclusion
ˆ
In general, θ not sufficient statistics →
information loss
2) {θj }M : unobserved variables
j=1
25 / 45
26. Criterion
Learning spline-based
curves models
L. Amate
Some definitions Marginal Max. likelihood criterion
Goal: learning curves
model
Goal: “simple”
γ = arg max p({cj }M |γ)
representation
Collective spline
ˆ j=1
γ∈G
modeling
Pb statement
Criterion = arg max ··· p({cj }M , {θj }M |γ)dθ1 · · · dθM
j=1 j=1
EM approaches γ∈G
Some definitions
Monte-Carlo online
EM
Results
Conclusion
26 / 45
27. Criterion
Learning spline-based
curves models
L. Amate
Some definitions Marginal Max. likelihood criterion
Goal: learning curves
model
Goal: “simple”
γ = arg max p({cj }M |γ)
representation
Collective spline
ˆ j=1
γ∈G
modeling
Pb statement
Criterion = arg max ··· p({cj }M , {θj }M |γ)dθ1 · · · dθM
j=1 j=1
EM approaches γ∈G
Some definitions
Monte-Carlo online
EM
Results
no analytical solution → numerical method
Conclusion ⇒ Expectation-Maximization algorithm
27 / 45
28. Overview
Learning spline-based
curves models
L. Amate
1 Some definitions
Goal: learning curves model
Some definitions
Goal: learning curves Goal: “simple” representation
model
Goal: “simple”
representation
2 Collective spline modeling
Collective spline
modeling problem statement
Pb statement
Criterion Criterion
EM approaches
Some definitions 3 EM approaches
Monte-Carlo online
EM Some definitions
Results
Monte-Carlo online EM
Conclusion
4 Results
5 Conclusion
28 / 45
29. EM algorithm
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model
Goal: “simple”
2-steps iterative method:
representation
Expected value of complete data likelihood:
Collective spline
modeling Q(γ|γ (t) ) = Eθ log p(c, θ|γ)|c, γ (t)
Pb statement
Criterion
Maximization of the complete data likelihood:
EM approaches
γ (t+1) = arg maxγ∈G Q(γ|γ (t) )
Some definitions
Monte-Carlo online local convergence
EM
Results ”hill climbing” algorithm
Conclusion
29 / 45
30. Exponential family
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model Case of exponential family:
Goal: “simple”
representation
Collective spline p(c, θ|γ) = h(c, θ) exp ( (S(c, θ), γ))
modeling
Pb statement
Criterion
(s, γ) = −Ψ(γ) + s, Φ(γ)
EM approaches
Some definitions (E)-step: ¯(c, γ (t−1) ) = Eθ S(c, θ)|c, γ (t−1)
s
Monte-Carlo online
EM
(M)-step: γ (t) = arg max (¯(c, γ (t−1) ), γ)
s
Results γ∈G
Conclusion
30 / 45
31. Monte-Carlo EM algorithm
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model
No anaytical expression for Q(γ|γ (t) )
Goal: “simple”
representation Stochastic approximation:
Collective spline M
modeling θj j=1
∼ p(θ|c, γ (t−1) ),
Pb statement M
(t) 1
Criterion Q(γ|γ )≈ M j=1 log p(c, θ(j) |γ)
EM approaches
Some definitions M with iteration: M (i) = i p , p>1
Monte-Carlo online
EM
Convergence: established for curved exponential families
Results
[Fort & Moulines, 2003]
Conclusion
31 / 45
32. Online EM algorithm
Learning spline-based
curves models Sequential process of data: [Capp´ & Moulines, 2009]
e
L. Amate
1 iteration ↔ 1 observation (1 curve)
Some definitions (E)-step: ¯(ci , γ (i−1) ) = Eθi S(ci , θi )|ci , γ (i−1)
s
(online)-step: ˆi = ˆi−1 + ηi ¯(ci , γ (i−1) ) − ˆi−1
Goal: learning curves
model s s s s
Goal: “simple”
representation (M)-step: γ (i) = arg max (ˆi , γ)
s
Collective spline γ∈G
modeling
Pb statement
ηi with iteration: ηi = η0 i −κ , κ ∈]1/2, 1[, η0 ∈ [0, 1]
Criterion
Convergence: established for exponential families [Capp´ &
e
EM approaches
Some definitions Moulines, 2009]
Monte-Carlo online
EM
c1 c2 ···
Results
Conclusion
¯1
s ¯2
s ···
ˆ1
s ˆ2
s ···
γ1
ˆ γ2
ˆ ··· γ
ˆ
32 / 45
33. Monte-Carlo online EM algorithm
Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
c1 c2 ···
model
Goal: “simple”
representation
MC MC
Collective spline
¯1
s ¯2
s ···
modeling
Pb statement
Criterion
ˆ1
s ˆ2
s ···
EM approaches
Some definitions
Monte-Carlo online
EM γ1
ˆ γ2
ˆ ··· γ
ˆ
Results
Conclusion
33 / 45
34. Monte-Carlo online EM algorithm
Learning spline-based
curves models
L. Amate
Some definitions i − th iteration:
Goal: learning curves
model 1 MC approximation:
Goal: “simple”
representation Mi
Collective spline θij ∼ p(θi |ci , γ (i−1) ),
ˆ
modeling j=1
Mi
Pb statement
Criterion
¯(ci , γ
s ˆ (i−1)
)≈ 1
Mi j=1 S(ci , θij )
EM approaches
Some definitions
2 Online step: ˆi = ˆi−1 + ηi ¯(ci , γ (i−1) ) − ˆi−1
s s s ˆ s
Monte-Carlo online
EM 3 Maximization step: γi = arg max (ˆi , γ)
ˆ s
Results
γ∈G
Conclusion
Numerical method (gradient)
34 / 45
35. Overview
Learning spline-based
curves models
L. Amate
1 Some definitions
Goal: learning curves model
Some definitions
Goal: learning curves Goal: “simple” representation
model
Goal: “simple”
representation
2 Collective spline modeling
Collective spline
modeling problem statement
Pb statement
Criterion Criterion
EM approaches
Some definitions 3 EM approaches
Monte-Carlo online
EM Some definitions
Results
Monte-Carlo online EM
Conclusion
4 Results
5 Conclusion
35 / 45
36. Results : simulated data
Learning spline-based
curves models
L. Amate
8
Some definitions
6
Goal: learning curves
model
Goal: “simple”
representation
4
Collective spline
modeling
Pb statement 2
Criterion
EM approaches
Some definitions 0
Monte-Carlo online
EM
Results −2
Conclusion
−4
−6
−4 −3 −2 −1 0 1 2 3 4 5
36 / 45
37. Results : simulated data
Learning spline-based
curves models
L. Amate Different proposals for MC sampler:
8 60
Some definitions
6
Goal: learning curves 50
model
Goal: “simple” 4
representation 40
Collective spline 2
modeling 30
Pb statement 0
Criterion 20
−2
EM approaches
Some definitions 10
−4
Monte-Carlo online
EM
−6 0
Results −4 −3 −2 −1 0 1 2 3 4 5 0 2 4 6 8 10 12
Conclusion
red: simulated data
blue: Dir (α)
green: Dir (1)
magenta: 1 rand knot + triangular distribution between neighbours
37 / 45
38. Results : simulated data
Learning spline-based
curves models
L. Amate
With different initializations:
8 30
Some definitions
Goal: learning curves 6
model 25
Goal: “simple”
representation 4
20
Collective spline
2
modeling
15
Pb statement
0
Criterion
10
EM approaches −2
Some definitions
Monte-Carlo online 5
−4
EM
Results −6
−4 −3 −2 −1 0 1 2 3 4 5
0
0 2 4 6 8 10 12
Conclusion red: simulated data
blue: good convergence
green: local convergence → identifiability pb.
38 / 45
40. Real data: leaves
Learning spline-based
curves models Model selection criterion to identify the complexity:
L. Amate M1 : k = 30 M2 : k = 15
Some definitions
Goal: learning curves
model
Goal: “simple”
representation
Collective spline
modeling
Pb statement
Criterion
Learning sets: L1 with 66 leaves, L2 with 33 leaves
EM approaches Test set: 51 leaves with 33 from C1 and 18 from C2
Some definitions
Monte-Carlo online Classification (likelihood) for curves from the test set
EM k1 = 15 & k2 = 30 k1 = k2 = 15
Results
HH Real HH Real
Conclusion class class
H
Model H C1 C2
H
Model H C1 C2
class. HH class. HH
H H
M1 33 0 M1 2 31
M2 0 18 M2 0 18
40 / 45
41. Overview
Learning spline-based
curves models
L. Amate
1 Some definitions
Goal: learning curves model
Some definitions
Goal: learning curves Goal: “simple” representation
model
Goal: “simple”
representation
2 Collective spline modeling
Collective spline
modeling problem statement
Pb statement
Criterion Criterion
EM approaches
Some definitions 3 EM approaches
Monte-Carlo online
EM Some definitions
Results
Monte-Carlo online EM
Conclusion
4 Results
5 Conclusion
41 / 45
42. Conclusion & future work
Learning spline-based
curves models
L. Amate
Conclusion
Some definitions probabilistic model for a set of curves
Goal: learning curves
model
Goal: “simple” new variant: MC online EM
representation
Collective spline
modeling Future works
Pb statement
Criterion
Solve the identifiability issue
EM approaches
Some definitions Compare results with another method. Which one ?
Monte-Carlo online
EM
Establish convergence properties of MC online EM
Results
Conclusion Introduce the complexity of the model in the collective
modeling problem
Develop links with “Shape” theory
42 / 45
43. Learning spline-based
curves models
L. Amate
Some definitions
Goal: learning curves
model
Goal: “simple”
representation
Collective spline
modeling THANK YOU !
Pb statement
Criterion ANY QUESTIONS ?
EM approaches
Some definitions
Monte-Carlo online
EM
Results
Conclusion
43 / 45
44. Details: curved exponential family
Learning spline-based
curves models
L. Amate
p(c, θ|γ) = h(c, θ) exp ( (S(c, θ), γ))
Some definitions
Goal: learning curves
(s, γ) = −Ψ(γ) + s, Φ(γ)
model
Goal: “simple”
representation
h(c, θ) = N|B T B|
Collective spline Ψ(γ) = (N + k) log(2πσ 2 ) + log(B(α))
modeling
Pb statement
log(N)
Criterion
β H B T Bβ
EM approaches (C − Bβ)H (C − Bβ) + N
T
B Bβ
Some definitions
Monte-Carlo online
S(c, θ) = N
EM B T Bβ
N
Results
Vec B T B/N
Conclusion
α−1
1
− 2σ2
µ0
Φ(γ) =
σ2
µ0
σ2
Vec µ∗ µT
0 0
− 2σ 2 44 / 45
45. Details: computation of ¯
s
Learning spline-based
curves models
L. Amate
Some definitions log(∆)q(∆|c, γ)d∆
Goal: learning curves
model C H C + kσ 2 − 2 C H βϕq(∆|c, γ)d∆ + N+1 ϕH B T Bϕq(∆|c, γ)d∆
N
Goal: “simple”
B T B ϕq(∆|c, γ)d∆
representation
¯(c, γ) =
s N
Collective spline
B T B ϕq(∆|c, γ)d∆
modeling N
Pb statement T
Vec(B B)q(∆|c, γ)d∆
Criterion Bµ0
ϕ = N+1 (B T B)−1 B T
N C + N
EM approaches T (c,∆,γ)
exp − p(∆|γ)
Some definitions 2σ 2
q(∆|c, γ) = T (c,∆,γ)
Monte-Carlo online
EM exp − p(∆|γ)d∆
2σ 2
Bµ0 H Bµ0
Results 1
T (c, ∆, γ) = C H C + N µ0 B T Bµ0 − N+1
N C + N
B(B T B)−1 B T C + N
Conclusion
45 / 45