SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
ASSIGNMENT 4:
VC DIMENSION
Institute for Machine Learning
Contact
Heads:
Markus Holzleitner,
Andreas Radler
————
Institute for Machine Learning
Johannes Kepler University
Altenberger Str. 69
A-4040 Linz
————
E-Mail: theoretical@ml.jku.at
Only mails to this list are answered!
Institute Homepage
1/14
Copyright statement:
This material, no matter whether in printed or electronic form,
may be used for personal and non-commercial educational use
only. Any reproduction of this material, no matter whether as a
whole or in parts, no matter whether in printed or in electronic
form, requires explicit prior acceptance of the authors.
2/14
Setting
 Data points Z = (xi, yi)l
i=1 are sampled iid. from p(x, y)
supported in X × {−1, 1}
 Want to learn g : X → {−1, 1} so that expected loss
(according to given loss function) is minimal. We will only
use Lzo in this chapter.
 Goal: minimize associated risk/generalization error:
R(g) =
R
X
P
y∈{±1} (L(y, g(x))p(x, y)) dx
 Also important: empirical risk:
Remp(g, Z) = Remp(g, l) = 1
l
l
P
i=1
L(yi, g(xi))
3/14
Hoeffding’s inequality
Lemma (Hoeffding)
Let X1, ..., Xl be independent random variables drawn accord-
ing to p. Assume further that Xi ∈ [mi, Mi]. Then for t ≥ 0:
p
l
X
i=1
(Xi − E(Xi)) ≥ t
!
≤ exp −
2t2
Pl
i=1(Mi − mi)2
!
4/14
Generalization bound: finite function
classes
 First step (one single model): Apply Hoeffding to
Xi = L(yi, g(xi)), E(Xi) = R(g) for fixed g ∈ G. Then
mi = 0, Mi = 1 (for all i = 1, ..., l) and for any ε  0:
p (|Remp(g, l) − R(g)| ≥ ε) = p |
l
X
i=1
(Xi − E(Xi))| ≥ lε
!
≤ 2 exp(−2lε2
).
Lemma (Generalization bound: finite model classes)
Let |G| = m. Choose failure probability 0  δ  1. Then with
probability at least 1 − δ for all g ∈ G:
R(g) ≤ Remp(g, l) +
r
ln(2m) + ln(1/δ)
2l
5/14
What does this result mean?
 Bound the true risk by empirical risk plus capacity term
 If function class increases, bound gets worse
 If m is small enough compared to l (so that ln m
l is small),
we get a tight bound
 The whole bound holds with probability 1 − δ. Decreasing δ
worsens the bound
 These arguments break down if |G| = ∞. For this case we
need new ideas.
6/14
Shattering coefficient: definition
Definition (Shattering coefficient)
For given sample x1, . . . , xl ∈ X and function class G define
Gx1,...,xl
as set of functions on G that we get when restricting
G to x1, . . . , xl:
Gx1,...,xl
= {g|x1,...,xl
: g ∈ G}
The shattering coefficient N(G, l) of G is defined as maximal
number of functions in Gx1,...,xl
.
N(G, l) = max{|Gx1,...,xl
| : x1, . . . , xl ∈ X}
7/14
Shattering coefficient: main result
Theorem (Generalization bound: shattering coefficient)
Let G be an arbitrary function class. Then for 0    1 :
p supg∈G |Remp(g, l) − R(g)|  ε

≤ 2N(G, 2l)e
−lε2
4 .
In other words: with probability at least 1−δ all functions g ∈ G
satisfy:
R(g) ≤ Remp(g, l) + 2
r
ln(N(G, 2l)) + ln(1/δ)
l
8/14
Symmetrization Lemma
Notation:
 Remp(g, l): empirical risk of given sample of l points
 R0
emp(g, l): empirical risk of second, independent sample of
l points: Ghost sample
Lemma (Symmetrization)
For ε  2
l :
p(sup
g∈G
|Remp(g, l) − R(g)|  ε)
≤ 2p(sup
g∈G
|Remp(g, l) − R0
emp(g, l)| 
ε
2
).
 Proof can be found e.g. here (Lemma 7.63, see also
notes). 9/14
Why symmetrization?
 If two g, g̃ coincide on all points of original and ghost
sample: Remp(g, l) = Remp(g̃, l) and R0
emp(g, l) = R0
emp(g̃, l)
 → sup over G in fact only runs over finitely many fcts: all
possible binary fcts on two samples of size l → number of
such fcts bounded by N(G, 2l).
 Bound analogous to one with finite function classes, just
replace m by N(G, 2l)
 Intuitively: shattering coefficient measures how powerful
fct. class is, how many labelings of dataset it can realize.
 For consistency: need ln N(G,2l)
l −
−
−
→
l→∞
0.
 However: shattering coefficients difficult to deal with. Need
to now how they grow in l. Study now a tool that helps in
this regard.
10/14
Definition: Shattering and VC-dimension
Definition (Shattering)
G shatters a set of points x1, ..., xl, if G can realize all possible
labelings, i.e. |Gx1,...,xl
| = 2l.
Definition (VC-Dimension (from Vapnik-Chervonenkis))
The VC-dimension of G is defined as largest l, so that there
exists a sample of size l that can be shattered by G:
VC(G) = max
n
l ∈ N|∃x1, ..., xl s.t. |Gx1,...,xl
| = 2l
o
.
If max does not exist: VC(G) = ∞.
11/14
VC-dimension: examples
 X = R, positive class=interior of closed interval, i.e.
G =

1[a,b] : a  b ∈ R .
 Positive class=interior of right triangles with sides adjacent
to right angle are parallel to aces. Right angle in lower left
corner. X = R2, G = {indicators of right triangles}.
 Positive class=interior of convex polygon, X = R2,
G = {indicators of convex polygons with d corners}
 X = R, G = {sgn (sin(tx)) : t ∈ R}. Then VC(G) = ∞
 X = Rr, G = {area above linear hyperplane}. Show in
exercises: VC(G) = r + 1
 X = Rr, ρ  0, G = {hyperplanes with margins at least γ}.
One can prove: if data are restricted to ball of radius R:
VC(G) = min

r, 2R2
γ2

+ 1.
12/14
Why VC-dimension? Sauer’s Lemma
Lemma (Vapnik, Chervonenkis, Sauer, Shelah)
Let G be a function class with VC(G) = d. Then:
 N(G, l) ≤
Pd
i=0
l
i

for all l ∈ N
 In particular, for all l ≥ d: N(G, l) ≤ el
d
d
.
 If fct. class has finite VC-dim → shattering coefficient only
grows polynomially.
 Infinite VC-dim → exponential growth
13/14
VC-dimension: main result
Theorem (Generalization bound: VC-dimension)
Let G a function class with VC(G) = d. Then with probability
at least 1 − δ all functions g ∈ G satisfy
R(g) ≤ Remp(g, l) + 2
s
d ln(2el
d ) + ln(1/δ)
l
14/14

Mais conteúdo relacionado

Semelhante a VC DIMENSION BOUND FOR LEARNING ALGORITHMS

Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 
Double Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataDouble Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataLu Mao
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorSebastian De Haro
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...Scott Faria
 
2. Definite Int. Theory Module-5.pdf
2. Definite Int. Theory Module-5.pdf2. Definite Int. Theory Module-5.pdf
2. Definite Int. Theory Module-5.pdfRajuSingh806014
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for SecurityAbhijit Mondal
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
Chap14_Sec8 - Lagrange Multiplier.ppt
Chap14_Sec8 - Lagrange Multiplier.pptChap14_Sec8 - Lagrange Multiplier.ppt
Chap14_Sec8 - Lagrange Multiplier.pptMahmudulHaque71
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
MTH_301_Lecture_12b_2022_.pdf
MTH_301_Lecture_12b_2022_.pdfMTH_301_Lecture_12b_2022_.pdf
MTH_301_Lecture_12b_2022_.pdfGlory676438
 
Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3Sebastian De Haro
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 

Semelhante a VC DIMENSION BOUND FOR LEARNING ALGORITHMS (20)

Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
02 asymp
02 asymp02 asymp
02 asymp
 
Double Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataDouble Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing Data
 
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorDual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...
 
2. Definite Int. Theory Module-5.pdf
2. Definite Int. Theory Module-5.pdf2. Definite Int. Theory Module-5.pdf
2. Definite Int. Theory Module-5.pdf
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for Security
 
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Chap14_Sec8 - Lagrange Multiplier.ppt
Chap14_Sec8 - Lagrange Multiplier.pptChap14_Sec8 - Lagrange Multiplier.ppt
Chap14_Sec8 - Lagrange Multiplier.ppt
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
MTH_301_Lecture_12b_2022_.pdf
MTH_301_Lecture_12b_2022_.pdfMTH_301_Lecture_12b_2022_.pdf
MTH_301_Lecture_12b_2022_.pdf
 
Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3Instantons and Chern-Simons Terms in AdS4/CFT3
Instantons and Chern-Simons Terms in AdS4/CFT3
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 

Último

Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Aggregage
 
71368-80-4.pdf Fast delivery good quality
71368-80-4.pdf Fast delivery  good quality71368-80-4.pdf Fast delivery  good quality
71368-80-4.pdf Fast delivery good qualitycathy664059
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi Bazaar
 
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh JiPsychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh Jiastral oracle
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Careerr98588472
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxRich Reba
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdfChris Skinner
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxRakhi Bazaar
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingrajputmeenakshi733
 
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...PRnews2
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersPeter Horsten
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdfChris Skinner
 
Paul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky
 

Último (20)

Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
 
71368-80-4.pdf Fast delivery good quality
71368-80-4.pdf Fast delivery  good quality71368-80-4.pdf Fast delivery  good quality
71368-80-4.pdf Fast delivery good quality
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
Rakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptxRakhi sets symbolizing the bond of love.pptx
Rakhi sets symbolizing the bond of love.pptx
 
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh JiPsychic Reading | Spiritual Guidance – Astro Ganesh Ji
Psychic Reading | Spiritual Guidance – Astro Ganesh Ji
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Career
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptx
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketing
 
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exporters
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf
 
Paul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate Professional
 

VC DIMENSION BOUND FOR LEARNING ALGORITHMS

  • 2. Contact Heads: Markus Holzleitner, Andreas Radler ———— Institute for Machine Learning Johannes Kepler University Altenberger Str. 69 A-4040 Linz ———— E-Mail: theoretical@ml.jku.at Only mails to this list are answered! Institute Homepage 1/14
  • 3. Copyright statement: This material, no matter whether in printed or electronic form, may be used for personal and non-commercial educational use only. Any reproduction of this material, no matter whether as a whole or in parts, no matter whether in printed or in electronic form, requires explicit prior acceptance of the authors. 2/14
  • 4. Setting Data points Z = (xi, yi)l i=1 are sampled iid. from p(x, y) supported in X × {−1, 1} Want to learn g : X → {−1, 1} so that expected loss (according to given loss function) is minimal. We will only use Lzo in this chapter. Goal: minimize associated risk/generalization error: R(g) = R X P y∈{±1} (L(y, g(x))p(x, y)) dx Also important: empirical risk: Remp(g, Z) = Remp(g, l) = 1 l l P i=1 L(yi, g(xi)) 3/14
  • 5. Hoeffding’s inequality Lemma (Hoeffding) Let X1, ..., Xl be independent random variables drawn accord- ing to p. Assume further that Xi ∈ [mi, Mi]. Then for t ≥ 0: p l X i=1 (Xi − E(Xi)) ≥ t ! ≤ exp − 2t2 Pl i=1(Mi − mi)2 ! 4/14
  • 6. Generalization bound: finite function classes First step (one single model): Apply Hoeffding to Xi = L(yi, g(xi)), E(Xi) = R(g) for fixed g ∈ G. Then mi = 0, Mi = 1 (for all i = 1, ..., l) and for any ε 0: p (|Remp(g, l) − R(g)| ≥ ε) = p | l X i=1 (Xi − E(Xi))| ≥ lε ! ≤ 2 exp(−2lε2 ). Lemma (Generalization bound: finite model classes) Let |G| = m. Choose failure probability 0 δ 1. Then with probability at least 1 − δ for all g ∈ G: R(g) ≤ Remp(g, l) + r ln(2m) + ln(1/δ) 2l 5/14
  • 7. What does this result mean? Bound the true risk by empirical risk plus capacity term If function class increases, bound gets worse If m is small enough compared to l (so that ln m l is small), we get a tight bound The whole bound holds with probability 1 − δ. Decreasing δ worsens the bound These arguments break down if |G| = ∞. For this case we need new ideas. 6/14
  • 8. Shattering coefficient: definition Definition (Shattering coefficient) For given sample x1, . . . , xl ∈ X and function class G define Gx1,...,xl as set of functions on G that we get when restricting G to x1, . . . , xl: Gx1,...,xl = {g|x1,...,xl : g ∈ G} The shattering coefficient N(G, l) of G is defined as maximal number of functions in Gx1,...,xl . N(G, l) = max{|Gx1,...,xl | : x1, . . . , xl ∈ X} 7/14
  • 9. Shattering coefficient: main result Theorem (Generalization bound: shattering coefficient) Let G be an arbitrary function class. Then for 0 1 : p supg∈G |Remp(g, l) − R(g)| ε ≤ 2N(G, 2l)e −lε2 4 . In other words: with probability at least 1−δ all functions g ∈ G satisfy: R(g) ≤ Remp(g, l) + 2 r ln(N(G, 2l)) + ln(1/δ) l 8/14
  • 10. Symmetrization Lemma Notation: Remp(g, l): empirical risk of given sample of l points R0 emp(g, l): empirical risk of second, independent sample of l points: Ghost sample Lemma (Symmetrization) For ε 2 l : p(sup g∈G |Remp(g, l) − R(g)| ε) ≤ 2p(sup g∈G |Remp(g, l) − R0 emp(g, l)| ε 2 ). Proof can be found e.g. here (Lemma 7.63, see also notes). 9/14
  • 11. Why symmetrization? If two g, g̃ coincide on all points of original and ghost sample: Remp(g, l) = Remp(g̃, l) and R0 emp(g, l) = R0 emp(g̃, l) → sup over G in fact only runs over finitely many fcts: all possible binary fcts on two samples of size l → number of such fcts bounded by N(G, 2l). Bound analogous to one with finite function classes, just replace m by N(G, 2l) Intuitively: shattering coefficient measures how powerful fct. class is, how many labelings of dataset it can realize. For consistency: need ln N(G,2l) l − − − → l→∞ 0. However: shattering coefficients difficult to deal with. Need to now how they grow in l. Study now a tool that helps in this regard. 10/14
  • 12. Definition: Shattering and VC-dimension Definition (Shattering) G shatters a set of points x1, ..., xl, if G can realize all possible labelings, i.e. |Gx1,...,xl | = 2l. Definition (VC-Dimension (from Vapnik-Chervonenkis)) The VC-dimension of G is defined as largest l, so that there exists a sample of size l that can be shattered by G: VC(G) = max n l ∈ N|∃x1, ..., xl s.t. |Gx1,...,xl | = 2l o . If max does not exist: VC(G) = ∞. 11/14
  • 13. VC-dimension: examples X = R, positive class=interior of closed interval, i.e. G = 1[a,b] : a b ∈ R . Positive class=interior of right triangles with sides adjacent to right angle are parallel to aces. Right angle in lower left corner. X = R2, G = {indicators of right triangles}. Positive class=interior of convex polygon, X = R2, G = {indicators of convex polygons with d corners} X = R, G = {sgn (sin(tx)) : t ∈ R}. Then VC(G) = ∞ X = Rr, G = {area above linear hyperplane}. Show in exercises: VC(G) = r + 1 X = Rr, ρ 0, G = {hyperplanes with margins at least γ}. One can prove: if data are restricted to ball of radius R: VC(G) = min r, 2R2 γ2 + 1. 12/14
  • 14. Why VC-dimension? Sauer’s Lemma Lemma (Vapnik, Chervonenkis, Sauer, Shelah) Let G be a function class with VC(G) = d. Then: N(G, l) ≤ Pd i=0 l i for all l ∈ N In particular, for all l ≥ d: N(G, l) ≤ el d d . If fct. class has finite VC-dim → shattering coefficient only grows polynomially. Infinite VC-dim → exponential growth 13/14
  • 15. VC-dimension: main result Theorem (Generalization bound: VC-dimension) Let G a function class with VC(G) = d. Then with probability at least 1 − δ all functions g ∈ G satisfy R(g) ≤ Remp(g, l) + 2 s d ln(2el d ) + ln(1/δ) l 14/14