AdaBoost is an ensemble learning algorithm that combines multiple weak learners into a single strong learner. It works in rounds, assigning higher weights to examples that previous rounds misclassified. Each weak learner is trained on the reweighted examples and must only be slightly better than random guessing. AdaBoost then calculates error rates and weights and combines the weak learners into a final strong learner that makes predictions based on the weighted votes of all weak learners. The algorithm stops when error rate stops decreasing or the maximum number of rounds is reached. AdaBoost often performs better than a single learner and is robust to noise.
3. • 1 0
• N
• (Xi , ci )(i = 1, . . . , N ) C
XC = (No, Yes, Yes, Yes), cC = 1
4. • R:
•
•
•
1
i wi
•
N
wi = 1/N (i = 1, . . . , N )
1
• 10
wi = 1/10
1
5. • t=1,...,R
1. : t i
t
pt
i pt =
wi
i N t
i=1 wi
N
• p1 = wi = 1/10
1
t=1 t
wi =1 i
i=1
2. WeakLearner
WeakLearner ( t < 1/2 ) ht
t
N
t = pt |ht (Xi ) − ci | < 1/2
i
i=1
0, ht (Xi ) ci
ht (Xi ) − ci =
1,
Step 3
6. • t=1 WeakLearner
ID A F h1(Xi) = 1
E,F h1(Xi)=ci D,J ci=0
ID G,H,I,J h1(Xi) = 0
WeakLearner
10 2
p1 = 1/10
i
1 = 1/10 × 2 = 1/5 < 1/2
T2
WeakLearner
7. t+1
3. wi βt
t
βt = 0≤ < 1/2, 0 ≤ βt < 1
1− t
t
• β
1−|ht (Xi )−ci |
wi = wi βt
t+1 t
• εt βt
• εt βt
• WeakLearner ht
8. • t=1
1 = 0.2
1
β1 = = 0.2/0.8 = 0.25
1− 1
• A D, G J ht E, F
2
wA = wB = wC = wD = wG = wH = wI = wJ
2 2 2 2 2 2 2
= 1/10 × β1 = 0.025
1
2
wE = wF = 1/10 × β1 = 0.1
2 0
16. AdaBoost
• hf ε
= D(i)
{i|hf (Xi )=yi }
• D(i)
• R
εt t
≤ 2 t (1 − t)
t=1
• t εt<1/2
2 t (1 − t) < 1
• WeakLearner
ε
17. • 1
• R R+1
N R 1/2
R+1
wi ≥ βt
i=1 t=1
•
• N
R+1 R+1
wi ≥ wi
i=1 {i|hf (Xi )=yi }
t 1−|hf (Xi )−yi |
t+1
wi = wi βi
R
1−|hf (Xi )−yi |
t+1
wi = D(i) βt
{i|hf (Xi )=yi } {i|hf (Xi )=yi } t=1
18. R hf (hf (Xi ) = yi )
1−|hf (Xi )−yi |
βt hf (Xi ) = 1 yi = 0 hf (Xi ) = 0
t=1
2
hf 2
hf (Xi ) = 1 yi = 0 5.1
hf (Xi ) = 1 yi = 0, hf(Xi)=1
(hf (Xi ) = R i )
y R
1
Xi ) = 1 yi = 0 h(− log βt )hf (Xi ) y≥ = 1 (− log βt )
f (Xi ) = 0 i
t=1 t=1
2
R
0 t=1 (log5.1
βt )
R R R
1 1
− log βt )hf (Xi ) ≥ (− log(log βt )(1 − hf (Xi )) ≥
βt ) (log βt )
2
t=1 t=1 2 t=1
1 − hf (Xi ) = 1− | hf (Xi ) − yi |
R R 1/2
R
1t f (Xi )−yi | ≥
1−|h
g βt )(1 − hf (Xi )) ≥ (log βt ) β βt
t=1 2 t=1
t=1
19. t t
t=1 t=1
hf (Xi )hf (Xi ) = 0 yi = i1= 1
=0 y 5.1
R R
1
(− log βt )ht (Xi ) < (− log βt )
t=1 t=1
2
174 −1 hf (Xi ) = 1− | ht (Xi ) 5 yi |
−
1
R R 2
1−|ht (Xi )−yi |
βt > βt
t=1 t=1
hf (Xi ) = 1 yi = 0 hf (Xi ) =
yi = 1
1
R R 2
1−|ht (Xi )−yi |
βt ≥ βt
t=1 t=1
21. = ·
t=1 βt
t=1
• 2
5.2 • 5.2 N N
t+1
wi N ≥ t
wi N× 2 i
i=1
t+1
wi i=1≥ wi × 2
t
i
•
: α≥0 r = {0, 1}
i=1 i=1
: α≥0 r = {0, 1}
αr ≤ 1 − (1 − α)r
αr ≤ 1 − (1 − α)r
22. 5.6. 175
N N
1−|ht (Xi )−yi |
t+1
wi = t
wi βt
i=1 i=1
N
≤ wi (1 − (1 − βt )(1 − |ht (Xi ) − yi |))
t
i=1
N N N
= wi − (1 − βt )
t t
wi − wi |ht (Xi ) − yi |
t
i=1 i=1 i=1
N N N
= wi − (1 − βt )
t t
wi − t
t
wi
i=1 i=1 i=1
N N
= wi − (1 − βt )
t t
wi (1 − t )
i=1 i=1
N
= t
wi × (1 − (1 − βt )(1 − t ))
i=1
βt = t /(1 − t)
N
= t
wi ×2 t
i=1
23. •
5.3 t WeakLearner
•
t
εt t WeakLearner
hf hf ε
R
≤ 2 t (1 − t)
• 1/2
t=1
R N N N R
: 5.1 βt ≤ 5.2 wi
R+1
≤ R
wi ×2 t ≤ 1
wi 2 t
t=1 t=1
R 1/2 i=1 N i=1 i=1
N
βt wi = 1
1
≤ R+1
wi ( 5.1 )
t=1 i=1
i=1
R
N
= 2
≤
t=1
t R
wi × 2 t( 5.2 )
i=1
βt = t /(1 − t ) 5.2 t = R − 1, R − 2, . . . , 1
R R
−1/2 N R
≤ 2 t× βt = 12 t (1 − t)
≤ wi 2 t
t=1 t=1