SlideShare uma empresa Scribd logo
1 de 34
Lecture 2:
Decision Trees – Nearest Neighbors
Last Updated: 09 Sept 2013
Uppsala University
Department of Linguistics and Philology, September 2013
Machine Learning for Language Technology
(Schedule)
1 Lec 2: Decision Trees - Nearest Neighbors
Most slides from previous courses (i.e.
from E. Alpaydin and J. Nivre) with
adaptations
2 Lec 2: Decision Trees - Nearest Neighbors
Supervised Classification
3
 Divide instances into (two or more) classes
 Instance (feature vector):
 Features may be categorical or numerical
 Class (label):
 Training data:
 Classification in Language Technology
 Spam filtering (spam vs. non-spam)
 Spelling error detection (error vs. no error)
 Text categorization (news, economy, culture, sport, ...)
 Named entity classification (person, location, organization,
...)
X {xt
,yt
}t 1
N
x x1, , xm
y
Lec 2: Decision Trees - Nearest Neighbors
Concept=the thing to be learned (here, how to decide the genre of a new
webpage);
Instances=examples (here a webpage)
Features=attributes (here morphological features, normalized counts)
Classes=labels (here webgenres)
Lec 2: Decision Trees - Nearest Neighbors4
Format for the software you chose (here
arff)
Lec 2: Decision Trees - Nearest Neighbors5
Models for Classification
6
 Generative probabilistic models:
 Model of P(x, y)
 Naive Bayes
 Conditional probabilistic models:
 Model of P(y | x)
 Logistic regression (next time)
 Discriminative model:
 No explicit probability model
 Decision trees, nearest neighbor classification (today)
 Perceptron, support vector machines, MIRA (next time)
Lec 2: Decision Trees - Nearest Neighbors
Repeating…
Lec 2: Decision Trees - Nearest Neighbors7
 Noise
 Data cleaning is expensive
and time consuming
 Margin
 Inductive bias
Types of inductive biases
•Minimum cross-validation error
•Maximum margin
•Minimum description length
•[...]
Decision Trees
Lec 2: Decision Trees - Nearest Neighbors8
Decision Trees
9
 Hierarchical tree structure for classification
 Each internal node specifies a test of some
feature
 Each branch corresponds to a value for the
tested feature
 Each leaf node provides a classification for the
instance
 Represents a disjunction of conjunctions of
constraints
 Each path from root to leaf specifies a
conjunction of tests
 The tree itself represents the disjunction of all
paths
Lec 2: Decision Trees - Nearest Neighbors
Decision Tree
Lec 2: Decision Trees - Nearest Neighbors10
Divide and Conquer
Lec 2: Decision Trees - Nearest Neighbors11
 Internal decision nodes
 Univariate: Uses a single attribute, xi
 Numeric xi : Binary split : xi > wm
 Discrete xi : n-way split for n possible values
 Multivariate: Uses all attributes, x
 Leaves
 Classification: class labels (or proportions)
 Regression: r average (or local fit)
 Learning:
 Greedy recursive algorithm
 Find best split X = (X1, ..., Xp), then induce tree for
each Xi
Classification Trees
(ID3, CART, C4.5)
Lec 2: Decision Trees - Nearest Neighbors12
ˆP Ci | x,m pm
i Nm
i
Nm
 For node m, Nm instances reach m, Ni
m belong to Ci
 Node m is pure if pi
m is 0 or 1
 Measure of impurity is entropy
Im pm
i
log2 pm
i
i 1
K
Example: Entropy
13
 Assume two classes (C1, C2) and four instances (x1, x2,
x3, x4)
 Case 1:
 C1 = {x1, x2, x3, x4}, C2 = { }
 Im = – (1 log 1 + 0 log 0) = 0
 Case 2:
 C1 = {x1, x2, x3}, C2 = {x4}
 Im = – (0.75 log 0.75 + 0.25 log 0.25) = 0.81
 Case 3:
 C1 = {x1, x2}, C2 = {x3, x4}
 Im = – (0.5 log 0.5 + 0.5 log 0.5) = 1
Lec 2: Decision Trees - Nearest Neighbors
Best Split
Lec 2: Decision Trees - Nearest Neighbors14
ˆP Ci |x,m, j pmj
i Nmj
i
Nmj
 If node m is pure, generate a leaf and stop,
otherwise split with test t and continue
recursively
 Find the test that minimizes impurity
 Impurity after split with test t:
 Nmj of Nm take branch j
 Ni
mj belong to Ci
Im
t Nmj
Nmj 1
n
pmj
i
log2 pmj
i
i 1
K
Im pm
i
log2 pm
i
i 1
K
ˆP Ci | x,m pm
i Nm
i
Nm
Lec 2: Decision Trees - Nearest Neighbors15
Information Gain
Lec 2: Decision Trees - Nearest Neighbors16
 We want to determine which attribute in a given set
of training feature vectors is most usefulfor
discriminating between the classes to be learned.
 •Information gain tells us how important a given
attribute of the feature vectors is.
 •We will use it to decide the ordering of attributes in
the nodes of a decision tree.
Information Gain and Gain Ratio
17
 Choosing the test that
minimizes impurity maximizes
the information gain (IG):
 Information gain prefers
features with many values
 The normalized version is called
gain ratio (GR):
Im
t Nmj
Nmj 1
n
pmj
i
log2 pmj
i
i 1
K
IGm
t
Im Im
t
Vm
t Nmj
Nm
log2
Nmj
Nmj 1
n
GRm
t IGm
t
Vm
t
Lec 2: Decision Trees - Nearest Neighbors
Im pm
i
log2 pm
i
i 1
K
Pruning Trees
Lec 2: Decision Trees - Nearest Neighbors18
 Decision trees are susceptible to overfitting
 Remove subtrees for better generalization:
 Prepruning: Early stopping (e.g., with entropy
threshold)
 Postpruning: Grow whole tree, then prune
subtrees
 Prepruning is faster, postpruning is more
accurate (requires a separate validation set)
Rule Extraction from Trees
Lec 2: Decision Trees - Nearest Neighbors
19
C4.5Rules
(Quinlan, 1993)
Learning Rules
Lec 2: Decision Trees - Nearest Neighbors20
 Rule induction is similar to tree induction but
 tree induction is breadth-first
 rule induction is depth-first (one rule at a time)
 Rule learning:
 A rule is a conjunction of terms (cf. tree path)
 A rule covers an example if all terms of the rule
evaluate to true for the example (cf. sequence of
tests)
 Sequential covering: Generate rules one at a time
until all positive examples are covered
 IREP (Fürnkrantz and Widmer, 1994), Ripper
(Cohen, 1995)
Properties of Decision Trees
21
 Decision trees are appropriate for classification
when:
 Features can be both categorical and numeric
 Disjunctive descriptions may be required
 Training data may be noisy (missing values, incorrect
labels)
 Interpretation of learned model is important (rules)
 Inductive bias of (most) decision tree
learners:
1. Prefers trees with informative attributes close to the
root
2. Prefers smaller trees over bigger ones (with
pruning)
3. Preference bias (incomplete search of complete
Lec 2: Decision Trees - Nearest Neighbors
k-Nearest Neighbors
Lec 2: Decision Trees - Nearest Neighbors22
Nearest Neighbor Classification
23
 An old idea
 Key components:
 Storage of old instances
 Similarity-based reasoning to new instances
This “rule of nearest neighbor” has considerable
elementary intuitive appeal and probably corresponds to
practice in many situations. For example, it is possible
that much medical diagnosis is influenced by the doctor's
recollection of the subsequent history of an earlier patient
whose symptoms resemble in some way those of the
current patient. (Fix and Hodges, 1952)
Lec 2: Decision Trees - Nearest Neighbors
k-Nearest Neighbour
24
 Learning:
 Store training instances in memory
 Classification:
 Given new test instance x,
 Compare it to all stored instances
 Compute a distance between x and each stored instance xt
 Keep track of the k closest instances (nearest neighbors)
 Assign to x the majority class of the k nearest neighbours
 A geometric view of learning
 Proximity in (feature) space  same class
 The smoothness assumption
Lec 2: Decision Trees - Nearest Neighbors
Eager and Lazy Learning
25
 Eager learning (e.g., decision trees)
 Learning – induce an abstract model from data
 Classification – apply model to new data
 Lazy learning (a.k.a. memory-based learning)
 Learning – store data in memory
 Classification – compare new data to data in memory
 Properties:
 Retains all the information in the training set – no abstraction
 Complex hypothesis space – suitable for natural language?
 Main drawback – classification can be very inefficient
Lec 2: Decision Trees - Nearest Neighbors
Dimensions of a k-NN Classifier
26
 Distance metric
 How do we measure distance between instances?
 Determines the layout of the instance space
 The k parameter
 How large neighborhood should we consider?
 Determines the complexity of the hypothesis space
Lec 2: Decision Trees - Nearest Neighbors
Distance Metric 1
27
 Overlap = count of mismatching features
(x,z) (xi,zi )
i 1
m
ii
ii
ii
ii
ii
zxif
zxif
elsenumericif
zx
zx
1
0
,
minmax
),(
Lec 2: Decision Trees - Nearest Neighbors
Distance Metric 2
28
 MVDM = Modified Value Difference Metric
(x,z) (xi,zi )
i 1
m
(xi,zi) P(Cj | xi) P(Cj | zi)
j 1
K
Lec 2: Decision Trees - Nearest Neighbors
The k parameter
29
 Tunes the complexity of the hypothesis space
 If k = 1, every instance has its own neighborhood
 If k = N, all the feature space is one neighborhood
k = 1 k = 15
Lec 2: Decision Trees - Nearest Neighbors
ˆE E(h | V ) 1 h xt
rt
t 1
M
A Simple Example
30
Training set:
1. (a, b, a, c)  A
2. (a, b, c, a)  B
3. (b, a, c, c)  C
4. (c, a, b, c)  A
New instance:
5. (a, b, b, a)
Distances (overlap):
Δ(1, 5) = 2
Δ(2, 5) = 1
Δ(3, 5) = 4
Δ(4, 5) = 3
k-NN classification:
1-NN(5) = B
2-NN(5) = A/B
3-NN(5) = A
4-NN(5) = A
Lec 2: Decision Trees - Nearest Neighbors
(x,z) (xi,zi )
i 1
m
ii
ii
ii
ii
ii
zxif
zxif
elsenumericif
zx
zx
1
0
,
minmax
),(
Further Variations on k-NN
31
 Feature weights:
 The overlap metric gives all features equal weight
 Features can be weighted by IG or GR
 Weighted voting:
 The normal decision rule gives all neighbors equal weight
 Instances can be weighted by (inverse) distance
Lec 2: Decision Trees - Nearest Neighbors
Properties of k-NN
32
 Nearest neighbor classification is appropriate when:
 Features can be both categorical and numeric
 Disjunctive descriptions may be required
 Training data may be noisy (missing values, incorrect
labels)
 Fast classification is not crucial
 Inductive bias of k-NN:
1. Nearby instances should have the same label
(smoothness)
2. All features are equally important (without feature
weights)
3. Complexity tuned by the k parameter
Lec 2: Decision Trees - Nearest Neighbors
Reading
Lec 2: Decision Trees - Nearest Neighbors33
 Alpaydin (2010): Ch. 3, 9
 Daumé (2012): Ch. 1-2
 Witten and Frank (2005): Ch. 2
 Tom Mitchell (1997) Ch.3: Decision Tree Learning
End of Lecture 2
Thanks for your attention
Lec 2: Decision Trees - Nearest Neighbors34

Mais conteúdo relacionado

Mais procurados

Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...butest
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersPier Luca Lanzi
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learningUjjawal
 
Decision Trees
Decision TreesDecision Trees
Decision TreesCloudxLab
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsDataminingTools Inc
 

Mais procurados (20)

Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision trees
Decision treesDecision trees
Decision trees
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Text categorization
Text categorizationText categorization
Text categorization
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
 

Destaque

Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
Webinar Presentation: Stories of Accidental Data Loss
Webinar Presentation: Stories of Accidental Data LossWebinar Presentation: Stories of Accidental Data Loss
Webinar Presentation: Stories of Accidental Data LossImanis Data
 

Destaque (7)

Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
SELF PRESENTATION
SELF PRESENTATIONSELF PRESENTATION
SELF PRESENTATION
 
Webinar Presentation: Stories of Accidental Data Loss
Webinar Presentation: Stories of Accidental Data LossWebinar Presentation: Stories of Accidental Data Loss
Webinar Presentation: Stories of Accidental Data Loss
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 

Semelhante a Lecture 02: Machine Learning for Language Technology - Decision Trees and Nearest Neighbors

An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptxdfgd7
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1aGaston Liberman
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 

Semelhante a Lecture 02: Machine Learning for Language Technology - Decision Trees and Nearest Neighbors (20)

Clustering
ClusteringClustering
Clustering
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1a
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Lect4
Lect4Lect4
Lect4
 
Bayes ML.ppt
Bayes ML.pptBayes ML.ppt
Bayes ML.ppt
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

Mais de Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 

Mais de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 

Último

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 

Último (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 

Lecture 02: Machine Learning for Language Technology - Decision Trees and Nearest Neighbors

  • 1. Lecture 2: Decision Trees – Nearest Neighbors Last Updated: 09 Sept 2013 Uppsala University Department of Linguistics and Philology, September 2013 Machine Learning for Language Technology (Schedule) 1 Lec 2: Decision Trees - Nearest Neighbors
  • 2. Most slides from previous courses (i.e. from E. Alpaydin and J. Nivre) with adaptations 2 Lec 2: Decision Trees - Nearest Neighbors
  • 3. Supervised Classification 3  Divide instances into (two or more) classes  Instance (feature vector):  Features may be categorical or numerical  Class (label):  Training data:  Classification in Language Technology  Spam filtering (spam vs. non-spam)  Spelling error detection (error vs. no error)  Text categorization (news, economy, culture, sport, ...)  Named entity classification (person, location, organization, ...) X {xt ,yt }t 1 N x x1, , xm y Lec 2: Decision Trees - Nearest Neighbors
  • 4. Concept=the thing to be learned (here, how to decide the genre of a new webpage); Instances=examples (here a webpage) Features=attributes (here morphological features, normalized counts) Classes=labels (here webgenres) Lec 2: Decision Trees - Nearest Neighbors4
  • 5. Format for the software you chose (here arff) Lec 2: Decision Trees - Nearest Neighbors5
  • 6. Models for Classification 6  Generative probabilistic models:  Model of P(x, y)  Naive Bayes  Conditional probabilistic models:  Model of P(y | x)  Logistic regression (next time)  Discriminative model:  No explicit probability model  Decision trees, nearest neighbor classification (today)  Perceptron, support vector machines, MIRA (next time) Lec 2: Decision Trees - Nearest Neighbors
  • 7. Repeating… Lec 2: Decision Trees - Nearest Neighbors7  Noise  Data cleaning is expensive and time consuming  Margin  Inductive bias Types of inductive biases •Minimum cross-validation error •Maximum margin •Minimum description length •[...]
  • 8. Decision Trees Lec 2: Decision Trees - Nearest Neighbors8
  • 9. Decision Trees 9  Hierarchical tree structure for classification  Each internal node specifies a test of some feature  Each branch corresponds to a value for the tested feature  Each leaf node provides a classification for the instance  Represents a disjunction of conjunctions of constraints  Each path from root to leaf specifies a conjunction of tests  The tree itself represents the disjunction of all paths Lec 2: Decision Trees - Nearest Neighbors
  • 10. Decision Tree Lec 2: Decision Trees - Nearest Neighbors10
  • 11. Divide and Conquer Lec 2: Decision Trees - Nearest Neighbors11  Internal decision nodes  Univariate: Uses a single attribute, xi  Numeric xi : Binary split : xi > wm  Discrete xi : n-way split for n possible values  Multivariate: Uses all attributes, x  Leaves  Classification: class labels (or proportions)  Regression: r average (or local fit)  Learning:  Greedy recursive algorithm  Find best split X = (X1, ..., Xp), then induce tree for each Xi
  • 12. Classification Trees (ID3, CART, C4.5) Lec 2: Decision Trees - Nearest Neighbors12 ˆP Ci | x,m pm i Nm i Nm  For node m, Nm instances reach m, Ni m belong to Ci  Node m is pure if pi m is 0 or 1  Measure of impurity is entropy Im pm i log2 pm i i 1 K
  • 13. Example: Entropy 13  Assume two classes (C1, C2) and four instances (x1, x2, x3, x4)  Case 1:  C1 = {x1, x2, x3, x4}, C2 = { }  Im = – (1 log 1 + 0 log 0) = 0  Case 2:  C1 = {x1, x2, x3}, C2 = {x4}  Im = – (0.75 log 0.75 + 0.25 log 0.25) = 0.81  Case 3:  C1 = {x1, x2}, C2 = {x3, x4}  Im = – (0.5 log 0.5 + 0.5 log 0.5) = 1 Lec 2: Decision Trees - Nearest Neighbors
  • 14. Best Split Lec 2: Decision Trees - Nearest Neighbors14 ˆP Ci |x,m, j pmj i Nmj i Nmj  If node m is pure, generate a leaf and stop, otherwise split with test t and continue recursively  Find the test that minimizes impurity  Impurity after split with test t:  Nmj of Nm take branch j  Ni mj belong to Ci Im t Nmj Nmj 1 n pmj i log2 pmj i i 1 K Im pm i log2 pm i i 1 K ˆP Ci | x,m pm i Nm i Nm
  • 15. Lec 2: Decision Trees - Nearest Neighbors15
  • 16. Information Gain Lec 2: Decision Trees - Nearest Neighbors16  We want to determine which attribute in a given set of training feature vectors is most usefulfor discriminating between the classes to be learned.  •Information gain tells us how important a given attribute of the feature vectors is.  •We will use it to decide the ordering of attributes in the nodes of a decision tree.
  • 17. Information Gain and Gain Ratio 17  Choosing the test that minimizes impurity maximizes the information gain (IG):  Information gain prefers features with many values  The normalized version is called gain ratio (GR): Im t Nmj Nmj 1 n pmj i log2 pmj i i 1 K IGm t Im Im t Vm t Nmj Nm log2 Nmj Nmj 1 n GRm t IGm t Vm t Lec 2: Decision Trees - Nearest Neighbors Im pm i log2 pm i i 1 K
  • 18. Pruning Trees Lec 2: Decision Trees - Nearest Neighbors18  Decision trees are susceptible to overfitting  Remove subtrees for better generalization:  Prepruning: Early stopping (e.g., with entropy threshold)  Postpruning: Grow whole tree, then prune subtrees  Prepruning is faster, postpruning is more accurate (requires a separate validation set)
  • 19. Rule Extraction from Trees Lec 2: Decision Trees - Nearest Neighbors 19 C4.5Rules (Quinlan, 1993)
  • 20. Learning Rules Lec 2: Decision Trees - Nearest Neighbors20  Rule induction is similar to tree induction but  tree induction is breadth-first  rule induction is depth-first (one rule at a time)  Rule learning:  A rule is a conjunction of terms (cf. tree path)  A rule covers an example if all terms of the rule evaluate to true for the example (cf. sequence of tests)  Sequential covering: Generate rules one at a time until all positive examples are covered  IREP (Fürnkrantz and Widmer, 1994), Ripper (Cohen, 1995)
  • 21. Properties of Decision Trees 21  Decision trees are appropriate for classification when:  Features can be both categorical and numeric  Disjunctive descriptions may be required  Training data may be noisy (missing values, incorrect labels)  Interpretation of learned model is important (rules)  Inductive bias of (most) decision tree learners: 1. Prefers trees with informative attributes close to the root 2. Prefers smaller trees over bigger ones (with pruning) 3. Preference bias (incomplete search of complete Lec 2: Decision Trees - Nearest Neighbors
  • 22. k-Nearest Neighbors Lec 2: Decision Trees - Nearest Neighbors22
  • 23. Nearest Neighbor Classification 23  An old idea  Key components:  Storage of old instances  Similarity-based reasoning to new instances This “rule of nearest neighbor” has considerable elementary intuitive appeal and probably corresponds to practice in many situations. For example, it is possible that much medical diagnosis is influenced by the doctor's recollection of the subsequent history of an earlier patient whose symptoms resemble in some way those of the current patient. (Fix and Hodges, 1952) Lec 2: Decision Trees - Nearest Neighbors
  • 24. k-Nearest Neighbour 24  Learning:  Store training instances in memory  Classification:  Given new test instance x,  Compare it to all stored instances  Compute a distance between x and each stored instance xt  Keep track of the k closest instances (nearest neighbors)  Assign to x the majority class of the k nearest neighbours  A geometric view of learning  Proximity in (feature) space  same class  The smoothness assumption Lec 2: Decision Trees - Nearest Neighbors
  • 25. Eager and Lazy Learning 25  Eager learning (e.g., decision trees)  Learning – induce an abstract model from data  Classification – apply model to new data  Lazy learning (a.k.a. memory-based learning)  Learning – store data in memory  Classification – compare new data to data in memory  Properties:  Retains all the information in the training set – no abstraction  Complex hypothesis space – suitable for natural language?  Main drawback – classification can be very inefficient Lec 2: Decision Trees - Nearest Neighbors
  • 26. Dimensions of a k-NN Classifier 26  Distance metric  How do we measure distance between instances?  Determines the layout of the instance space  The k parameter  How large neighborhood should we consider?  Determines the complexity of the hypothesis space Lec 2: Decision Trees - Nearest Neighbors
  • 27. Distance Metric 1 27  Overlap = count of mismatching features (x,z) (xi,zi ) i 1 m ii ii ii ii ii zxif zxif elsenumericif zx zx 1 0 , minmax ),( Lec 2: Decision Trees - Nearest Neighbors
  • 28. Distance Metric 2 28  MVDM = Modified Value Difference Metric (x,z) (xi,zi ) i 1 m (xi,zi) P(Cj | xi) P(Cj | zi) j 1 K Lec 2: Decision Trees - Nearest Neighbors
  • 29. The k parameter 29  Tunes the complexity of the hypothesis space  If k = 1, every instance has its own neighborhood  If k = N, all the feature space is one neighborhood k = 1 k = 15 Lec 2: Decision Trees - Nearest Neighbors ˆE E(h | V ) 1 h xt rt t 1 M
  • 30. A Simple Example 30 Training set: 1. (a, b, a, c)  A 2. (a, b, c, a)  B 3. (b, a, c, c)  C 4. (c, a, b, c)  A New instance: 5. (a, b, b, a) Distances (overlap): Δ(1, 5) = 2 Δ(2, 5) = 1 Δ(3, 5) = 4 Δ(4, 5) = 3 k-NN classification: 1-NN(5) = B 2-NN(5) = A/B 3-NN(5) = A 4-NN(5) = A Lec 2: Decision Trees - Nearest Neighbors (x,z) (xi,zi ) i 1 m ii ii ii ii ii zxif zxif elsenumericif zx zx 1 0 , minmax ),(
  • 31. Further Variations on k-NN 31  Feature weights:  The overlap metric gives all features equal weight  Features can be weighted by IG or GR  Weighted voting:  The normal decision rule gives all neighbors equal weight  Instances can be weighted by (inverse) distance Lec 2: Decision Trees - Nearest Neighbors
  • 32. Properties of k-NN 32  Nearest neighbor classification is appropriate when:  Features can be both categorical and numeric  Disjunctive descriptions may be required  Training data may be noisy (missing values, incorrect labels)  Fast classification is not crucial  Inductive bias of k-NN: 1. Nearby instances should have the same label (smoothness) 2. All features are equally important (without feature weights) 3. Complexity tuned by the k parameter Lec 2: Decision Trees - Nearest Neighbors
  • 33. Reading Lec 2: Decision Trees - Nearest Neighbors33  Alpaydin (2010): Ch. 3, 9  Daumé (2012): Ch. 1-2  Witten and Frank (2005): Ch. 2  Tom Mitchell (1997) Ch.3: Decision Tree Learning
  • 34. End of Lecture 2 Thanks for your attention Lec 2: Decision Trees - Nearest Neighbors34