SlideShare uma empresa Scribd logo
1 de 45
Lecture 1: Introduction
Last updated: 2 Sept 2013
Uppsala University
Department of Linguistics and Philology, September 2013
1 Lecture 1: Introduction
Machine Learning for Language Technology
(Schedule)
Practical Information
Reading list, assignments, exams, etc.
Lecture 1: Introduction2
Most slides from previous courses (i.e. from E. Alpaydin and J. Nivre) with adaptations
Course web pages:
Lecture 1: Introduction3
 http://stp.lingfil.uu.se/~santinim/ml/ml_fall2013.pdf
 http://stp.lingfil.uu.se/~santinim/ml/MachineLearning_fall20
13.htm
 Contact details:
 santinim@stp.lingfil.uu.se
 (marinasantini.ms@gmail.com)
About the Course
4
 Introduction to machine learning
 Focus on methods used in Language Technology
and NLP
 Decision trees and nearest neighbor methods (Lecture 2)
 Linear models –The Weka ML package (Lectures 3 and
4)
 Ensemble methods – Structured Predictions (Lectures 5
and 6)
 Text Mining and Big Data - R and Rapid Miner(Lecture 7)
 Unsupervised learning (clustering) (Lecture 8, Magnus
Rosell)
 Builds on Statistical Methods in NLPLecture 1: Introduction
Digression: Generative vs. Discriminative Methods
Lecture 1: Introduction5
 A generative method only applies to probabilistic
models. A model is generative if it gives us the
model of the joint distribution of x and y together (P
(Y , X ). It is called generative because you can
generate with the correct probability distribution data
points.
 Conditional methods model the conditional
distribution of the output given the input: P (Y | X ).
 Discriminative methods do not model probabilities at
all, but they map the input to the output directly.
Compulsory Reading List
6
 Main textbooks:
1. Ethem Alpaydin. 2010. Introduction to Machine Learning.
Second Edition. MIT Press (free online version)
2. Hal Daumé III. 2012. A Course in Machine Learning (free
online version)
3. Ian H. Witten, Eibe Frank Data Mining: Practical Machine
Learning Tools and Techniques, Second Edition (free online
version)
 Additional material :
1. Dietterich, T. G. (2000). Ensemble Methods in Machine
Learning. In J. Kittler and F. Roli (Ed.) First International
Workshop on Multiple Classifier Systems
2. Michael Collins. 2002. Discriminative Training Methods for
Hidden Markov Models: Theory and Experiments with
Perceptron Algorithms.In Proceedings of the 2002
Conference on Empirical Methods in Natural Language
Processing
3. Hanna M. Wallach. 2004. Conditional Random Fields: An
Introduction.Technical Report MS-CIS-04-21. Department of
Computer and Information Science, University of
Pennsylvania.
Lecture 1: Introduction
Optional Reading
Lecture 1: Introduction7
 Hal Daumé III, John Langford, Daniel Marcu (2005)
Search-Based Structured Prediction as
Classification. NIPS Workshop on Advances in
Structured Learning for Text and Speech Processing
(ASLTSP).
Assignments and Examination
8
 Three Assignments:
 Decision trees and nearest neighbors
 Perceptron learning
 Clustering
 General Info:
 No lab sessions, supervision by email
 Reports and Assignments 1 & 2 must be submitted to
santinim@stp.lingfil.uu.se
 Report and Assignment 3 must be submitted to rosell@kth.se
 Examination:
 Written report submitted for each assignment
 All three assignments necessary to pass the course
 Grade determined by majority grade on assignments
Lecture 1: Introduction
Practical Organization
9
 Marina Santini (1-7); Magnus Rosell (8)
 45min + 15 min break
 Lectures on Course webpage and SlideShare
 Email all your questions to me:
santinim@stp.lingfil.uu.se
 Video Recordings of the previous ML course:
http://stp.lingfil.uu.se/~nivre/master/ml.html
 Send me an email, so I make sure that I have all the
correct email addresses to santinim@stp.lingfil.uu.se
Lecture 1: Introduction
Schedule: http://stp.lingfil.uu.se/~santinim/ml/ml_fall2013.pdf
Lecture 1: Introduction10
What is Machine Learning?
Introduction to:
•Classification
•Regression
•Supervised Learning
•Unsupervised
Learning
•Reinforcement
Learning Lecture 1: Introduction11
What is Machine Learning
Lecture 1: Introduction12
 Machine learning is programming computers to
optimize a performance criterion for some task using
example data or past experience
 Why learning?
 No known exact method – vision, speech recognition,
robotics, spam filters, etc.
 Exact method too expensive – statistical physics
 Task evolves over time – network routing
 Compare:
 No need to use machine learning for computing payroll…
we just need an algorithm
Machine Learning – Data Mining – Artificial
Intelligence – Statistics
Lecture 1: Introduction13
 Machine Learning: creation of a model that uses training
data or past experience
 Data Mining: application of learning methods to large
datasets (ex. physics, astronomy, biology, etc.)
 Text mining = machine learning applied to unstructured textual
data (ex. sentiment analyisis, social media monitoring, etc. Text
Mining, Wikipedia)
 Artificial intelligence: a model that can adapt to a
changing environment.
 Statistics: Machine learning uses the theory of
statistics in building mathematical models, because
the core task is making inference from a sample.
The bio-cognitive analogy
Lecture 1: Introduction14
 Imagine that a learning algorithm as a single neuron.
 This neuron receives input from other neurons, one
for each input feature.
 The strength of these inputs are the feature values.
 Each input has a weight and the neuron simply sums
up all the weighted inputs.
 Based on this sum, the neuron decides whether to
“fire” or not. Firing is interpreted as being a positive
example and not firing is interpreted as being a
negative example.
Elements of Machine Learning
15
1. Generalization:
 Generalize from specific examples
 Based on statistical inference
2. Data:
 Training data: specific examples to learn from
 Test data: (new) specific examples to assess
performance
3. Models:
 Theoretical assumptions about the task/domain
 Parameters that can be inferred from data
4. Algorithms:
 Learning algorithm: infer model (parameters) from data
 Inference algorithm: infer predictions from model
Lecture 1: Introduction
Types of Machine Learning
Lecture 1: Introduction16
 Association
 Supervised Learning
 Classification
 Regression
 Unsupervised Learning
 Reinforcement Learning
Learning Associations
Lecture 1: Introduction17
 Basket analysis:
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services
Example: P ( chips | beer ) = 0.7
Classification
Lecture 1: Introduction18
 Example: Credit
scoring
 Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Classification in NLP
19
 Binary classification:
 Spam filtering (spam vs. non-spam)
 Spelling error detection (error vs. non error)
 Multiclass classification:
 Text categorization (news, economy, culture, sport, ...)
 Named entity classification (person, location, organization,
...)
 Structured prediction:
 Part-of-speech tagging (classes = tag sequences)
 Syntactic parsing (classes = parse trees)
Lecture 1: Introduction
Regression
 Example:
Price of used car
 x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
Lecture 1: Introduction20
y = wx+w0
Uses of Supervised Learning
 Prediction of future cases:
 Use the rule to predict the output for future inputs
 Knowledge extraction:
 The rule is easy to understand
 Compression:
 The rule is simpler than the data it explains
 Outlier detection:
 Exceptions that are not covered by the rule, e.g., fraud
21 Lecture 1: Introduction
Unsupervised Learning
 Finding regularities in data
 No mapping to outputs
 Clustering:
 Grouping similar instances
 Example applications:
 Customer segmentation in CRM
 Image compression: Color quantization
 NLP: Unsupervised text categorization
22 Lecture 1: Introduction
Reinforcement Learning
 Learning a policy = sequence of outputs/actions
 No supervised output but delayed reward
 Example applications:
 Game playing
 Robot in a maze
 NLP: Dialogue systems, for example:
 NJFun: A Reinforcement Learning Spoken Dialogue System
(http://acl.ldc.upenn.edu/W/W00/W00-0304.pdf)
 Reinforcement Learning for Spoken Dialogue Systems:
Comparing Strengths and Weaknesses for Practical
Deployment
(http://research.microsoft.com/apps/pubs/default.aspx?id=7029
5)

23 Lecture 1: Introduction
Supervised Learning
Introduction to:
•Margin
•Noise
•Bias
Lecture 1: Introduction24
Supervised Classification
Lecture 1: Introduction25
 Learning the class C of a “family car” from
examples
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect
from a family car?
 Output (labels):
Positive (+) and negative (–) examples
 Input representation (features):
x1: price, x2 : engine power
Training set X

X  {xt
,rt
}t1
N

r 
1 if x is positive
0 if x is negative



Lecture 1: Introduction
26

x 
x1
x2






Hypothesis class H
Lecture 1: Introduction27

p1  price  p2  AND e1  engine power  e2 
Empirical (training) error
Lecture 1: Introduction28

h(x) 
1 if h says x is positive
0 if h says x is negative




E(h | X)  1 h xt
  rt
 t1
N

Empirical error of h on X:
S, G, and the Version Space
Lecture 1: Introduction29
most specific hypothesis, S
most general hypothesis, G
h  H, between S and G
is consistent [E( h | X) =
0] and make up the
version space
Margin
Lecture 1: Introduction30
 Choose h with largest margin
Noise
Lecture 1: Introduction31
Unwanted anomaly in data
 Imprecision in input attributes
 Errors in labeling data points
 Hidden attributes (relative to H)
Consequence:
 No h in H may be consistent!
Noise and Model Complexity
Lecture 1: Introduction32
Arguments for simpler model (Occam’s razor principle)
1. Easier to make predictions
2. Easier to train (fewer parameters)
3. Easier to understand
4. Generalizes better (if data is noisy)
Inductive Bias
Lecture 1: Introduction33
 Learning is an ill-posed problem
 Training data is never sufficient to find a unique solution
 There are always infinitely many consistent hypotheses
 We need an inductive bias:
 Assumptions that entail a unique h for a training set X
1. Hypothesis class H – axis-aligned rectangles
2. Learning algorithm – find consistent hypothesis with max-
margin
3. Hyperparameters – trade-off between training error and
margin
Model Selection and Generalization
Lecture 1: Introduction34
 Generalization – how well a model performs on new
data
 Overfitting: H more complex than C
 Underfitting: H less complex than C
Triple Trade-Off
Lecture 1: Introduction35
 Trade-off between three factors:
1. Complexity of H, c(H)
2. Training set size N
3. Generalization error E on new data
 Dependencies:
 As N E
 As c(H) first E and then E
Model Selection  Generalization Error
Lecture 1: Introduction36
 To estimate generalization error, we need data
unseen during training:
 Given models (hypotheses) h1, ..., hk induced from
the training set X, we can use E(hi | V ) to select the
model hi with the smallest generalization error

ˆE  E(h |V)  1 h xt
  rt
 t1
M


V  {xt
,rt
}t1
M
 X
Model Assessment
Lecture 1: Introduction37
 To estimate the generalization error of the best
model hi, we need data unseen during training and
model selection
 Standard setup:
1. Training set X (50–80%)
2. Validation (development) set V (10–25%)
3. Test (publication) set T (10–25%)
 Note:
 Validation data can be added to training set before testing
 Resampling methods can be used if data is limited
Cross-Validation
Lecture 1: Introduction38
121
31
2
2
2
32
1
1
1



K
K
K
K
K
K
XXXTXV
XXXTXV
XXXTXV




 K-fold cross-validation: Divide X into X1, ..., XK
 Note:
 Generalization error estimated by means across K folds
 Training sets for different folds share K–2 parts
 Separate test set must be maintained for model
assessment
Bootstrapping
Lecture 1: Introduction39
3680
1
1 1
.





 
e
N
N
 Generate new training sets of size N from X by
random sampling with replacement
 Use original training set as validation set (V = X )
 Probability that we do not pick an instance after N
draws
that is, only 36.8% of instances are new!
Measuring Error
Lecture 1: Introduction40
 Error rate = # of errors / # of instances = (FP+FN) / N
 Accuracy = # of correct / # of instances = (TP+TN) / N
 Recall = # of found positives / # of positives = TP / (TP+FN)
 Precision = # of found positives / # of found = TP / (TP+FP)
Statistical Inference
41
 Interval estimation to quantify the precision of our
measurements
 Hypothesis testing to assess whether differences
between models are statistically significant

m  1.96

N
e01  e10 1 
2
e01  e10
~ X1
2
Lecture 1: Introduction
Supervised Learning – Summary
42
 Training data + learner  hypothesis
 Learner incorporates inductive bias
 Test data + hypothesis  estimated generalization
 Test data must be unseen
 Next lectures:
 Different learners in LT with different inductive biases
Lecture 1: Introduction
Anatomy of a Supervised Learner
(Dimensions of a supervised machine learning
algorithm)
 Model:
 Loss function:
 Optimization
procedure:

g x |q 

E q | X  L rt
,g xt
|q  t

Lecture 1: Introduction43

q*  arg min
q
E q | X 
Reading
Lecture 1: Introduction44
 Alpaydin (2010): Ch. 1-2; 19 (mathematical
underpinnings)
 Witten and Frank (2005): Ch. 1 (examples and
domains of application)
End of Lecture 1
Thanks for your attention
Lecture 1: Introduction45

Mais conteúdo relacionado

Mais procurados

Neural network
Neural network Neural network
Neural network Faireen
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodKirkwood Donavin
 
Reinforcement Learning- AI Track
Reinforcement Learning- AI TrackReinforcement Learning- AI Track
Reinforcement Learning- AI TrackNetscribes
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theoryswapnac12
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural networkNagarajan
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
Federated learning
Federated learningFederated learning
Federated learningMindos Cheng
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning GlossaryNVIDIA
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9Randa Elanwar
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 

Mais procurados (20)

Neural network
Neural network Neural network
Neural network
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Reinforcement Learning- AI Track
Reinforcement Learning- AI TrackReinforcement Learning- AI Track
Reinforcement Learning- AI Track
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Adaptive Resonance Theory (ART)
Adaptive Resonance Theory (ART)Adaptive Resonance Theory (ART)
Adaptive Resonance Theory (ART)
 
Small world
Small worldSmall world
Small world
 
ppt
pptppt
ppt
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
Federated learning
Federated learningFederated learning
Federated learning
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning Glossary
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9Introduction to Neural networks (under graduate course) Lecture 2 of 9
Introduction to Neural networks (under graduate course) Lecture 2 of 9
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 

Semelhante a Lecture 01: Machine Learning for Language Technology - Introduction

课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learningnep_test_account
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in FinanceHamed Vaheb
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.butest
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,Eirini Ntoutsi
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfHassanElalfy4
 
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdfAdharchandsaha
 
01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkkJesusTekonbo
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
Artificial intelligence for Social Good
Artificial intelligence for Social GoodArtificial intelligence for Social Good
Artificial intelligence for Social GoodOana Tifrea-Marciuska
 
Introduction.doc
Introduction.docIntroduction.doc
Introduction.docbutest
 
Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics coursegideymichael
 

Semelhante a Lecture 01: Machine Learning for Language Technology - Introduction (20)

Eric Smidth
Eric SmidthEric Smidth
Eric Smidth
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in Finance
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
A Machine Learning Primer,
A Machine Learning Primer,A Machine Learning Primer,
A Machine Learning Primer,
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf27332020002_PC-CS601_Robotics_Debjit Doira.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
 
01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
Artificial intelligence for Social Good
Artificial intelligence for Social GoodArtificial intelligence for Social Good
Artificial intelligence for Social Good
 
Introduction.doc
Introduction.docIntroduction.doc
Introduction.doc
 
Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics course
 

Mais de Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

Mais de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Último

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 

Último (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 

Lecture 01: Machine Learning for Language Technology - Introduction

  • 1. Lecture 1: Introduction Last updated: 2 Sept 2013 Uppsala University Department of Linguistics and Philology, September 2013 1 Lecture 1: Introduction Machine Learning for Language Technology (Schedule)
  • 2. Practical Information Reading list, assignments, exams, etc. Lecture 1: Introduction2 Most slides from previous courses (i.e. from E. Alpaydin and J. Nivre) with adaptations
  • 3. Course web pages: Lecture 1: Introduction3  http://stp.lingfil.uu.se/~santinim/ml/ml_fall2013.pdf  http://stp.lingfil.uu.se/~santinim/ml/MachineLearning_fall20 13.htm  Contact details:  santinim@stp.lingfil.uu.se  (marinasantini.ms@gmail.com)
  • 4. About the Course 4  Introduction to machine learning  Focus on methods used in Language Technology and NLP  Decision trees and nearest neighbor methods (Lecture 2)  Linear models –The Weka ML package (Lectures 3 and 4)  Ensemble methods – Structured Predictions (Lectures 5 and 6)  Text Mining and Big Data - R and Rapid Miner(Lecture 7)  Unsupervised learning (clustering) (Lecture 8, Magnus Rosell)  Builds on Statistical Methods in NLPLecture 1: Introduction
  • 5. Digression: Generative vs. Discriminative Methods Lecture 1: Introduction5  A generative method only applies to probabilistic models. A model is generative if it gives us the model of the joint distribution of x and y together (P (Y , X ). It is called generative because you can generate with the correct probability distribution data points.  Conditional methods model the conditional distribution of the output given the input: P (Y | X ).  Discriminative methods do not model probabilities at all, but they map the input to the output directly.
  • 6. Compulsory Reading List 6  Main textbooks: 1. Ethem Alpaydin. 2010. Introduction to Machine Learning. Second Edition. MIT Press (free online version) 2. Hal Daumé III. 2012. A Course in Machine Learning (free online version) 3. Ian H. Witten, Eibe Frank Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (free online version)  Additional material : 1. Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. In J. Kittler and F. Roli (Ed.) First International Workshop on Multiple Classifier Systems 2. Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing 3. Hanna M. Wallach. 2004. Conditional Random Fields: An Introduction.Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania. Lecture 1: Introduction
  • 7. Optional Reading Lecture 1: Introduction7  Hal Daumé III, John Langford, Daniel Marcu (2005) Search-Based Structured Prediction as Classification. NIPS Workshop on Advances in Structured Learning for Text and Speech Processing (ASLTSP).
  • 8. Assignments and Examination 8  Three Assignments:  Decision trees and nearest neighbors  Perceptron learning  Clustering  General Info:  No lab sessions, supervision by email  Reports and Assignments 1 & 2 must be submitted to santinim@stp.lingfil.uu.se  Report and Assignment 3 must be submitted to rosell@kth.se  Examination:  Written report submitted for each assignment  All three assignments necessary to pass the course  Grade determined by majority grade on assignments Lecture 1: Introduction
  • 9. Practical Organization 9  Marina Santini (1-7); Magnus Rosell (8)  45min + 15 min break  Lectures on Course webpage and SlideShare  Email all your questions to me: santinim@stp.lingfil.uu.se  Video Recordings of the previous ML course: http://stp.lingfil.uu.se/~nivre/master/ml.html  Send me an email, so I make sure that I have all the correct email addresses to santinim@stp.lingfil.uu.se Lecture 1: Introduction
  • 11. What is Machine Learning? Introduction to: •Classification •Regression •Supervised Learning •Unsupervised Learning •Reinforcement Learning Lecture 1: Introduction11
  • 12. What is Machine Learning Lecture 1: Introduction12  Machine learning is programming computers to optimize a performance criterion for some task using example data or past experience  Why learning?  No known exact method – vision, speech recognition, robotics, spam filters, etc.  Exact method too expensive – statistical physics  Task evolves over time – network routing  Compare:  No need to use machine learning for computing payroll… we just need an algorithm
  • 13. Machine Learning – Data Mining – Artificial Intelligence – Statistics Lecture 1: Introduction13  Machine Learning: creation of a model that uses training data or past experience  Data Mining: application of learning methods to large datasets (ex. physics, astronomy, biology, etc.)  Text mining = machine learning applied to unstructured textual data (ex. sentiment analyisis, social media monitoring, etc. Text Mining, Wikipedia)  Artificial intelligence: a model that can adapt to a changing environment.  Statistics: Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample.
  • 14. The bio-cognitive analogy Lecture 1: Introduction14  Imagine that a learning algorithm as a single neuron.  This neuron receives input from other neurons, one for each input feature.  The strength of these inputs are the feature values.  Each input has a weight and the neuron simply sums up all the weighted inputs.  Based on this sum, the neuron decides whether to “fire” or not. Firing is interpreted as being a positive example and not firing is interpreted as being a negative example.
  • 15. Elements of Machine Learning 15 1. Generalization:  Generalize from specific examples  Based on statistical inference 2. Data:  Training data: specific examples to learn from  Test data: (new) specific examples to assess performance 3. Models:  Theoretical assumptions about the task/domain  Parameters that can be inferred from data 4. Algorithms:  Learning algorithm: infer model (parameters) from data  Inference algorithm: infer predictions from model Lecture 1: Introduction
  • 16. Types of Machine Learning Lecture 1: Introduction16  Association  Supervised Learning  Classification  Regression  Unsupervised Learning  Reinforcement Learning
  • 17. Learning Associations Lecture 1: Introduction17  Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services Example: P ( chips | beer ) = 0.7
  • 18. Classification Lecture 1: Introduction18  Example: Credit scoring  Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 19. Classification in NLP 19  Binary classification:  Spam filtering (spam vs. non-spam)  Spelling error detection (error vs. non error)  Multiclass classification:  Text categorization (news, economy, culture, sport, ...)  Named entity classification (person, location, organization, ...)  Structured prediction:  Part-of-speech tagging (classes = tag sequences)  Syntactic parsing (classes = parse trees) Lecture 1: Introduction
  • 20. Regression  Example: Price of used car  x : car attributes y : price y = g (x | q ) g ( ) model, q parameters Lecture 1: Introduction20 y = wx+w0
  • 21. Uses of Supervised Learning  Prediction of future cases:  Use the rule to predict the output for future inputs  Knowledge extraction:  The rule is easy to understand  Compression:  The rule is simpler than the data it explains  Outlier detection:  Exceptions that are not covered by the rule, e.g., fraud 21 Lecture 1: Introduction
  • 22. Unsupervised Learning  Finding regularities in data  No mapping to outputs  Clustering:  Grouping similar instances  Example applications:  Customer segmentation in CRM  Image compression: Color quantization  NLP: Unsupervised text categorization 22 Lecture 1: Introduction
  • 23. Reinforcement Learning  Learning a policy = sequence of outputs/actions  No supervised output but delayed reward  Example applications:  Game playing  Robot in a maze  NLP: Dialogue systems, for example:  NJFun: A Reinforcement Learning Spoken Dialogue System (http://acl.ldc.upenn.edu/W/W00/W00-0304.pdf)  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment (http://research.microsoft.com/apps/pubs/default.aspx?id=7029 5)  23 Lecture 1: Introduction
  • 25. Supervised Classification Lecture 1: Introduction25  Learning the class C of a “family car” from examples  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car?  Output (labels): Positive (+) and negative (–) examples  Input representation (features): x1: price, x2 : engine power
  • 26. Training set X  X  {xt ,rt }t1 N  r  1 if x is positive 0 if x is negative    Lecture 1: Introduction 26  x  x1 x2      
  • 27. Hypothesis class H Lecture 1: Introduction27  p1  price  p2  AND e1  engine power  e2 
  • 28. Empirical (training) error Lecture 1: Introduction28  h(x)  1 if h says x is positive 0 if h says x is negative     E(h | X)  1 h xt   rt  t1 N  Empirical error of h on X:
  • 29. S, G, and the Version Space Lecture 1: Introduction29 most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent [E( h | X) = 0] and make up the version space
  • 30. Margin Lecture 1: Introduction30  Choose h with largest margin
  • 31. Noise Lecture 1: Introduction31 Unwanted anomaly in data  Imprecision in input attributes  Errors in labeling data points  Hidden attributes (relative to H) Consequence:  No h in H may be consistent!
  • 32. Noise and Model Complexity Lecture 1: Introduction32 Arguments for simpler model (Occam’s razor principle) 1. Easier to make predictions 2. Easier to train (fewer parameters) 3. Easier to understand 4. Generalizes better (if data is noisy)
  • 33. Inductive Bias Lecture 1: Introduction33  Learning is an ill-posed problem  Training data is never sufficient to find a unique solution  There are always infinitely many consistent hypotheses  We need an inductive bias:  Assumptions that entail a unique h for a training set X 1. Hypothesis class H – axis-aligned rectangles 2. Learning algorithm – find consistent hypothesis with max- margin 3. Hyperparameters – trade-off between training error and margin
  • 34. Model Selection and Generalization Lecture 1: Introduction34  Generalization – how well a model performs on new data  Overfitting: H more complex than C  Underfitting: H less complex than C
  • 35. Triple Trade-Off Lecture 1: Introduction35  Trade-off between three factors: 1. Complexity of H, c(H) 2. Training set size N 3. Generalization error E on new data  Dependencies:  As N E  As c(H) first E and then E
  • 36. Model Selection  Generalization Error Lecture 1: Introduction36  To estimate generalization error, we need data unseen during training:  Given models (hypotheses) h1, ..., hk induced from the training set X, we can use E(hi | V ) to select the model hi with the smallest generalization error  ˆE  E(h |V)  1 h xt   rt  t1 M   V  {xt ,rt }t1 M  X
  • 37. Model Assessment Lecture 1: Introduction37  To estimate the generalization error of the best model hi, we need data unseen during training and model selection  Standard setup: 1. Training set X (50–80%) 2. Validation (development) set V (10–25%) 3. Test (publication) set T (10–25%)  Note:  Validation data can be added to training set before testing  Resampling methods can be used if data is limited
  • 38. Cross-Validation Lecture 1: Introduction38 121 31 2 2 2 32 1 1 1    K K K K K K XXXTXV XXXTXV XXXTXV      K-fold cross-validation: Divide X into X1, ..., XK  Note:  Generalization error estimated by means across K folds  Training sets for different folds share K–2 parts  Separate test set must be maintained for model assessment
  • 39. Bootstrapping Lecture 1: Introduction39 3680 1 1 1 .        e N N  Generate new training sets of size N from X by random sampling with replacement  Use original training set as validation set (V = X )  Probability that we do not pick an instance after N draws that is, only 36.8% of instances are new!
  • 40. Measuring Error Lecture 1: Introduction40  Error rate = # of errors / # of instances = (FP+FN) / N  Accuracy = # of correct / # of instances = (TP+TN) / N  Recall = # of found positives / # of positives = TP / (TP+FN)  Precision = # of found positives / # of found = TP / (TP+FP)
  • 41. Statistical Inference 41  Interval estimation to quantify the precision of our measurements  Hypothesis testing to assess whether differences between models are statistically significant  m  1.96  N e01  e10 1  2 e01  e10 ~ X1 2 Lecture 1: Introduction
  • 42. Supervised Learning – Summary 42  Training data + learner  hypothesis  Learner incorporates inductive bias  Test data + hypothesis  estimated generalization  Test data must be unseen  Next lectures:  Different learners in LT with different inductive biases Lecture 1: Introduction
  • 43. Anatomy of a Supervised Learner (Dimensions of a supervised machine learning algorithm)  Model:  Loss function:  Optimization procedure:  g x |q   E q | X  L rt ,g xt |q  t  Lecture 1: Introduction43  q*  arg min q E q | X 
  • 44. Reading Lecture 1: Introduction44  Alpaydin (2010): Ch. 1-2; 19 (mathematical underpinnings)  Witten and Frank (2005): Ch. 1 (examples and domains of application)
  • 45. End of Lecture 1 Thanks for your attention Lecture 1: Introduction45

Notas do Editor

  1. To