SlideShare a Scribd company logo
1 of 57
Download to read offline
Pattern Recognition
with
Semi-Supervised Learning
Algorithm
Presented By:-
Anurodh Kumar Sinha
2ND Year MSLIS Student
DRTC,ISI Bangalore
2012-2013
12/3/2012 1
Agenda
• What is Pattern Recognition?
• What is Machine Learning n why we
need..?
• Types of Learning Algorithm
• Need for Semi-Supervised Learning
• Conclusion
212/3/2012
What is a Pattern…. ?
• An entity, vaguely defined, that could be
given a name,
• e.g.:
– handwritten word,
– human face,
– fingerprint image,
– speech signal,
312/3/2012
What is Feature….?
• A Feature is an individual measurable heuristic property
of a phenomenon being observed
• Examples
• In speech recognition, features for recognizing
phonemes can include noise ratios, length of sounds,
relative power, filter matches and many others.
• In spam detection algorithms, features may include
whether certain email headers are present or absent,
whether they are well formed, what language the email
appears to be, the grammatical correctness of the text
412/3/2012
What is Pattern Recognition.. ?
• Pattern recognition is the study of how
machines can:
– observe the environment,
– learn to distinguish patterns of interest,
– make sound and reasonable decisions about
the categories of the patterns.
“The assignment of a physical object or event to
one of several prespecified categories” -- Duda
& Hart
512/3/2012
What is Pattern Recognition… ?
• Some Applications:
612/3/2012
Motivation For The Study
of
Pattern Recognition
It is threefold.
• In Artificial Intelligence, which is concerned with techniques, that enable
computers to do things, that seem intelligent when done by people.
• It is an important aspect of applying computers to do analysis and
classification of measurements, from its data observation.
• Pattern Recognition techniques provide a unified frame work to study a
variety of techniques with use of mathematics and computer science, which
helps the machine to make decision
712/3/2012
Methodology
of
Pattern Recognitions
It consists of the following:
1.We observe patterns
2.We study the relationships between the various
patterns.
3.We study the relationships between patterns and
ourselves and thus arrive at situations
4.We study the changes in situations and come to know
about the events.
5.We study events and thus find rule behind the events.
6. Using the rule, we can predict future events.
812/3/2012
An Example
• Suppose that:
– A fish packing plant
wants to automate the
process of sorting
incoming fish on a
conveyor belt according
to species,
– There are two species:
• Sea bass,
• Salmon.
912/3/2012
An Example
1012/3/2012
An Example
How to distinguish one specie from the other ?
(length, width, weight, number and shape of fins,
tail shape,etc.)
1112/3/2012
An Example
• Suppose we also know that:
– Sea bass are typically wider than salmon.
– But it may happen that decision can‟t be
made on single feature
• We can use more than one feature for our
decision:
– Lightness (x1) and width (x2)
1212/3/2012
Components of a typical Pattern Recognition System
Pattern Recognition Systems
1312/3/2012
Examples of applications
• Optical Character
Recognition (OCR)
• Biometrics
• Diagnostic systems
• Military applications
• Handwritten: sorting letters by postal code,
input device for PDA‘s.
• Printed texts: reading machines for blind
people, digitalization of text documents.
• Face recognition, verification, retrieval.
• Finger prints recognition.
• Speech recognition.
• Medical diagnosis: X-Ray, EKG analysis.
• Machine diagnostics, waster detection.
• Automated Target Recognition (ATR).
• Image segmentation and analysis (recognition
from aerial or satelite photographs). 1412/3/2012
What is Machine Learning….?
• Machine Learning algorithms discover the relationships
between the variables of a system (input, output and
hidden) from direct samples of the system
• These algorithms originate form many fields:
– Statistics, mathematics, theoretical computer science,
physics, neuroscience, etc
1512/3/2012
16
Why Learning algorithms needed….?
• When the relationships between all system variables (input,
output, and hidden) is completely understood!
• This is NOT the case for almost any real system!
• Growing flood of online data
• Computational power is available
• progress in algorithms and theory
12/3/2012
Learning Algorithm Application
• Data mining: using historical data to improve decision
– medical records ⇒ medical knowledge
– log data to model user
• Software applications we can‟t program by hand
– autonomous driving
– speech recognition
• Self customizing programs
– Newsreader that learns user interests
1712/3/2012
Typical Example
• 9714 patient records, each describing a pregnancy and birth
• Each patient record contains 215 features
• Classes of future patients at high risk for Emergency Cesarean
Section
Learn to predict:
Given:
1812/3/2012
19
The Sub-Fields
of
Machine Learning
• Supervised Learning
• Unsupervised Learning
• Semi-Supervsed Learning
12/3/2012
Supervised Learning
2012/3/2012
Supervised Learning
• Supervised learning is the machine learning task of inferring
a function from labeled training data.
• In training data each pair consisting of an input object
(typically a vector) and a desired output value (also called the
supervisory signal).
• A supervised learning algorithm analyzes the training data
and produces an inferred function, which is called a classifier
(if the output is discrete) or a regression function (if the output
is continuous).
• The inferred function should predict the correct output value
for any valid input object. This requires the learning algorithm
to generalize from the training data to unseen situations in a
"reasonable" way.
2112/3/2012
Supervised Learning Process: two
Steps
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the model accuracy
,
casestestofnumberTotal
tionsclassificacorrectofNumber
Accuracy
12/3/2012 22
Example
• A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
– age
– Job
– House
– credit rating
– etc.
• Problem: to decide whether an application should
approved, or to classify applications into two
categories, approved and not approved.
12/3/2012 23
An example: Data (Loan
Application)
2412/3/2012
25
An example: The Learning Task
• Learn a classification model from the data
• Use the model to classify future loan applications
into
– Yes (approved) and
– No (not approved)
• What is the class for following case/instance?
Bayesian Classifier
• The Simple Bayesian Classifier (SBC) uses probabilistic
methods for classification
• The basis of bayesian classifier is: The probability of document
„d‟ being in class „c‟ is computed as-
where P(tk|c) is the conditional probability of term occurring in a
document of class c .Where,
2612/3/2012
Simple Bayes Classifier
2712/3/2012
12/3/2012 28
Unsupervised Learning
2912/3/2012
• Organizing data into classes such that there is
Inter-clusters distance  maximized
Intra-clusters distance  minimized
• Finding the class labels and the number of classes directly from the data
(in contrast to classification).
• More informally, finding natural groupings among objects.
What is Unsupervised
Learning….?
• Unsupervised learning refers to the problem of trying to
find hidden structure in unlabeled data
• Sometimes it is also referred as Clustering
3012/3/2012
What is a natural grouping among these objects?
3112/3/2012
School EmployeesSimpson's Family MalesFemales
Clustering is subjective
What is a natural grouping among these objects?
3212/3/2012
What is clustering for….?
Let us see some real-life examples
• Example 1: Groups people of similar sizes together to
make “small”, “medium” and “large” T-Shirts.
– Tailor-made for each person: too expensive
– One-size-fits-all: does not fit all.
• Example 2: Given a collection of text documents, we
want to organize them according to their content
similarities,
– To produce a topic hierarchy
12/3/2012 33
What is clustering for? (cont…)
In fact, clustering is one of the most utilized
data mining techniques
– It has a long history, and used in almost every field,
e.g., medicine, psychology, botany, sociology, biology,
archeology, marketing, insurance, libraries, etc.
– In recent years, due to the rapid increase of online
documents, text clustering becomes important.
12/3/2012 34
K-means algorithm
12/3/2012 35
36
An example
+
+
12/3/2012
37
An example (cont …)
12/3/2012
Semi-Supervised learning
12/3/2012 38
Supervised Learning
versus
Unsupervised Learning
• Unsupervised clustering Group similar objects together
to find clusters
• Minimize intra-class distance
• Maximize inter-class distance
• Supervised classification Class label for each training
sample is given
– Build a model from the training data
– Predict class label on unseen future data points
3912/3/2012
However, for many problems, labeled
data can be rare or expensive.
Unlabeled data is much cheaper.
Speech
Images
Medical outcomes
Customer modeling
Protein sequences
Web pages
Need to pay someone to do it, requires special testing,…
4012/3/2012
Why Semi-Supervised Learning…?
• Why not clustering?
– The clusters produced may not be the ones
required.
– Sometimes there are multiple possible
groupings.
• Why not classification?
– Sometimes there are insufficient labeled data.
4112/3/2012
Semi-Supervised Learning
• Combines labeled and unlabeled data
during training to improve performance:
– Semi-supervised classification: Training on labeled data exploits
additional unlabeled data, frequently resulting in a more accurate
classifier.
– Semi-supervised clustering: Uses small amount of labeled data to
aid and bias the clustering of unlabeled data.
Unsupervised
clustering
Semi-supervised
learning
Supervised
classification
4212/3/2012
Semi-Supervised Classification
• An initial classifier is designed using the labeled data set D(l).
This classifier is then used to assign class labels to examples
in D(u). Then the classifier is re-trained using D(l) U D(u).
• The last two steps are usually repeated for a given number of
times or until some criterion is satisfied
4312/3/2012
.
Semi-Supervised Classification
Example
.
.
.
.
. .
. ..
.
.
...
.
.
.
..
4412/3/2012
.
Semi-Supervised Classification
Example
.
.
.
.
. .
. ..
.
.
...
.
.
.
..
4512/3/2012
Semi-Supervised Classification
• Algorithms:
– Semisupervised EM
[Ghahramani:NIPS94,Nigam:ML00].
– Co-training [Blum:COLT98].
– Transductive SVM‟s [Vapnik:98,Joachims:ICML99].
– Graph based algorithms
• Assumptions:
– Known, fixed set of categories given in the labeled
data.
– Goal is to improve classification of examples into
these known categories.
4612/3/2012
Semi-Supervised clustering
• Input:
– A set of unlabeled objects, each described by a set of attributes
(numeric and/or categorical)
– A small amount of domain knowledge
• Output:
– A partitioning of the objects into k clusters (possibly with some
discarded as outliers)
• Objective:
– Maximum intra-cluster similarity
– Minimum inter-cluster similarity
– High consistency between the partitioning and the domain
knowledge
4712/3/2012
How Semi-Supervised Clustering done?
• In addition to the similarity information used by unsupervised
clustering, in many cases a small amount of knowledge is available
concerning either pairwise (must-link or cannot-link) constraints
between data items or class labels for some items.
• Instead of simply using this knowledge for the external validation of
the results of clustering, one can imagine letting it “guide” or “adjust”
the clustering process, i.e. provide a limited form of supervision. The
resulting approach is called semi-supervised clustering
4812/3/2012
Illustration
x
x
Must-link
Determine
its label
Assign to the red class
4912/3/2012
Illustration
x
x
Cannot-link
Determine
its label
Assign to the red class
5012/3/2012
• According to different given domain knowledge:
– Users provide class labels (seeded points) a priori to
some of the documents
-Users know about which few documents are related
(must-link) or unrelated (cannot-link)
Semi-Supervised Clustering
Seeded points
Must-link
Cannot-link
5112/3/2012
Semi-supervised Clustering Algorithm
• Semi-supervised Clustering with labels (Partial label
information is given ) :
– SS-Seeded-Kmeans ( Sugato Basu, et al. ICML 2002)
- SS-Constraint-Kmeans ( Sugato Basu, et al. ICML 2002)
• Semi-supervised Clustering with Constraints (Pairwise
Constraints (Must-link, Cannot-link) is given):
– SS-COP-Kmeans (Wagstaff et al. ICML01)
– SS-HMRF-Kmeans (Sugato Basu, et al. ACM SIGKDD
2004)
– SS-Kernel-Kmeans (Brian Kulis, et al. ICML 2005)
– SS-Spectral-Normalized-Cuts (X. Ji, et al. ACM SIGIR
2006)
5212/3/2012
Co-Training Algorithm
5312/3/2012
Conclusions
• Semi-supervised learning is an area of increasing
importance in Machine Learning.
• Automatic methods of collecting data make it more
important than ever to develop methods to make use
of unlabeled data.
• Several promising algorithms (only discussed a few).
Also new theoretical framework to help guide further
development.
5412/3/2012
Reference
• Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley &
Sons, New York, 1982. (2nd edition 2000).
• Fukunaga: Introduction to Statistical Pattern Recognition. Academic
Press, 1990.
• Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition
, Pattern Recognition ,Elsevier(USA)) ,1982
• K. Nigam and R. Ghani. Analyzing the effectiveness and applicability
of co-training. In Proceedings of the ninth international conference on
Information and knowledge management, pages 86{93. ACM, 2000.
• http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-
classification-1.html
• http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.htm
l
12/3/2012 55
Any
•
•
Question…..Suggestion….Feedback….???
5612/3/2012
Thank You
5712/3/2012

More Related Content

What's hot

Pattern Matching AI.pdf
Pattern Matching AI.pdfPattern Matching AI.pdf
Pattern Matching AI.pdfsaadurrehman35
 
Pattern Recognition and its Applications
Pattern Recognition and its ApplicationsPattern Recognition and its Applications
Pattern Recognition and its ApplicationsSajida Mohammad
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...TEJVEER SINGH
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
 
What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)Randa Elanwar
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 

What's hot (20)

Introduction to pattern recognition
Introduction to pattern recognitionIntroduction to pattern recognition
Introduction to pattern recognition
 
Pattern Matching AI.pdf
Pattern Matching AI.pdfPattern Matching AI.pdf
Pattern Matching AI.pdf
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Pattern Recognition and its Applications
Pattern Recognition and its ApplicationsPattern Recognition and its Applications
Pattern Recognition and its Applications
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Handwritten Character Recognition
Handwritten Character RecognitionHandwritten Character Recognition
Handwritten Character Recognition
 
Machine learning
Machine learningMachine learning
Machine learning
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 
ML Basics
ML BasicsML Basics
ML Basics
 
Image recognition
Image recognitionImage recognition
Image recognition
 
What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)What is pattern recognition (lecture 4 of 6)
What is pattern recognition (lecture 4 of 6)
 
Computer vision
Computer visionComputer vision
Computer vision
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Viewers also liked

Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemREHMAT ULLAH
 
Patterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsPatterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsMukalele Rogers
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Machine_Learning_Project_Report
Machine_Learning_Project_ReportMachine_Learning_Project_Report
Machine_Learning_Project_ReportAditya Hendra
 
Abstract of the Presentation
Abstract of the PresentationAbstract of the Presentation
Abstract of the Presentationbutest
 
Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Simone Rossi
 
IEEE_RFIC 2007
IEEE_RFIC 2007 IEEE_RFIC 2007
IEEE_RFIC 2007 wence00
 
IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) wence00
 
Navigli sssw
Navigli ssswNavigli sssw
Navigli ssswSSSW
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
 

Viewers also liked (18)

Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Patterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsPatterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminars
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 
Image processing ppt
Image processing pptImage processing ppt
Image processing ppt
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Machine_Learning_Project_Report
Machine_Learning_Project_ReportMachine_Learning_Project_Report
Machine_Learning_Project_Report
 
Abstract of the Presentation
Abstract of the PresentationAbstract of the Presentation
Abstract of the Presentation
 
Thesis defense
Thesis defenseThesis defense
Thesis defense
 
Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...
 
IEEE_RFIC 2007
IEEE_RFIC 2007 IEEE_RFIC 2007
IEEE_RFIC 2007
 
IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2)
 
Navigli sssw
Navigli ssswNavigli sssw
Navigli sssw
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

Similar to Seminar(Pattern Recognition)

STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine LearningSharjeel Imtiaz
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis fmaru kindeneh
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningIRJET Journal
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Miningnabil_alsharafi
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Researcheckchela
 
Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...LACE Project
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
 
Data Science Course In Pune
Data Science Course In Pune Data Science Course In Pune
Data Science Course In Pune APT
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangaloredevipatnala1
 
Data Science Course Pune
Data Science Course PuneData Science Course Pune
Data Science Course PuneAPT
 
Data science course pdf
Data science course pdfData science course pdf
Data science course pdfAPT
 

Similar to Seminar(Pattern Recognition) (20)

STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine Learning
 
Data mining
Data miningData mining
Data mining
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Mining
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithms
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...
 
Data Processing
 Data Processing Data Processing
Data Processing
 
Data Science Course In Pune
Data Science Course In Pune Data Science Course In Pune
Data Science Course In Pune
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangalore
 
Data Science Course Pune
Data Science Course PuneData Science Course Pune
Data Science Course Pune
 
Data science course pdf
Data science course pdfData science course pdf
Data science course pdf
 
Data Science Course
Data Science CourseData Science Course
Data Science Course
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Seminar(Pattern Recognition)

  • 1. Pattern Recognition with Semi-Supervised Learning Algorithm Presented By:- Anurodh Kumar Sinha 2ND Year MSLIS Student DRTC,ISI Bangalore 2012-2013 12/3/2012 1
  • 2. Agenda • What is Pattern Recognition? • What is Machine Learning n why we need..? • Types of Learning Algorithm • Need for Semi-Supervised Learning • Conclusion 212/3/2012
  • 3. What is a Pattern…. ? • An entity, vaguely defined, that could be given a name, • e.g.: – handwritten word, – human face, – fingerprint image, – speech signal, 312/3/2012
  • 4. What is Feature….? • A Feature is an individual measurable heuristic property of a phenomenon being observed • Examples • In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others. • In spam detection algorithms, features may include whether certain email headers are present or absent, whether they are well formed, what language the email appears to be, the grammatical correctness of the text 412/3/2012
  • 5. What is Pattern Recognition.. ? • Pattern recognition is the study of how machines can: – observe the environment, – learn to distinguish patterns of interest, – make sound and reasonable decisions about the categories of the patterns. “The assignment of a physical object or event to one of several prespecified categories” -- Duda & Hart 512/3/2012
  • 6. What is Pattern Recognition… ? • Some Applications: 612/3/2012
  • 7. Motivation For The Study of Pattern Recognition It is threefold. • In Artificial Intelligence, which is concerned with techniques, that enable computers to do things, that seem intelligent when done by people. • It is an important aspect of applying computers to do analysis and classification of measurements, from its data observation. • Pattern Recognition techniques provide a unified frame work to study a variety of techniques with use of mathematics and computer science, which helps the machine to make decision 712/3/2012
  • 8. Methodology of Pattern Recognitions It consists of the following: 1.We observe patterns 2.We study the relationships between the various patterns. 3.We study the relationships between patterns and ourselves and thus arrive at situations 4.We study the changes in situations and come to know about the events. 5.We study events and thus find rule behind the events. 6. Using the rule, we can predict future events. 812/3/2012
  • 9. An Example • Suppose that: – A fish packing plant wants to automate the process of sorting incoming fish on a conveyor belt according to species, – There are two species: • Sea bass, • Salmon. 912/3/2012
  • 11. An Example How to distinguish one specie from the other ? (length, width, weight, number and shape of fins, tail shape,etc.) 1112/3/2012
  • 12. An Example • Suppose we also know that: – Sea bass are typically wider than salmon. – But it may happen that decision can‟t be made on single feature • We can use more than one feature for our decision: – Lightness (x1) and width (x2) 1212/3/2012
  • 13. Components of a typical Pattern Recognition System Pattern Recognition Systems 1312/3/2012
  • 14. Examples of applications • Optical Character Recognition (OCR) • Biometrics • Diagnostic systems • Military applications • Handwritten: sorting letters by postal code, input device for PDA‘s. • Printed texts: reading machines for blind people, digitalization of text documents. • Face recognition, verification, retrieval. • Finger prints recognition. • Speech recognition. • Medical diagnosis: X-Ray, EKG analysis. • Machine diagnostics, waster detection. • Automated Target Recognition (ATR). • Image segmentation and analysis (recognition from aerial or satelite photographs). 1412/3/2012
  • 15. What is Machine Learning….? • Machine Learning algorithms discover the relationships between the variables of a system (input, output and hidden) from direct samples of the system • These algorithms originate form many fields: – Statistics, mathematics, theoretical computer science, physics, neuroscience, etc 1512/3/2012
  • 16. 16 Why Learning algorithms needed….? • When the relationships between all system variables (input, output, and hidden) is completely understood! • This is NOT the case for almost any real system! • Growing flood of online data • Computational power is available • progress in algorithms and theory 12/3/2012
  • 17. Learning Algorithm Application • Data mining: using historical data to improve decision – medical records ⇒ medical knowledge – log data to model user • Software applications we can‟t program by hand – autonomous driving – speech recognition • Self customizing programs – Newsreader that learns user interests 1712/3/2012
  • 18. Typical Example • 9714 patient records, each describing a pregnancy and birth • Each patient record contains 215 features • Classes of future patients at high risk for Emergency Cesarean Section Learn to predict: Given: 1812/3/2012
  • 19. 19 The Sub-Fields of Machine Learning • Supervised Learning • Unsupervised Learning • Semi-Supervsed Learning 12/3/2012
  • 21. Supervised Learning • Supervised learning is the machine learning task of inferring a function from labeled training data. • In training data each pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). • A supervised learning algorithm analyzes the training data and produces an inferred function, which is called a classifier (if the output is discrete) or a regression function (if the output is continuous). • The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. 2112/3/2012
  • 22. Supervised Learning Process: two Steps Learning (training): Learn a model using the training data Testing: Test the model using unseen test data to assess the model accuracy , casestestofnumberTotal tionsclassificacorrectofNumber Accuracy 12/3/2012 22
  • 23. Example • A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, – age – Job – House – credit rating – etc. • Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved. 12/3/2012 23
  • 24. An example: Data (Loan Application) 2412/3/2012
  • 25. 25 An example: The Learning Task • Learn a classification model from the data • Use the model to classify future loan applications into – Yes (approved) and – No (not approved) • What is the class for following case/instance?
  • 26. Bayesian Classifier • The Simple Bayesian Classifier (SBC) uses probabilistic methods for classification • The basis of bayesian classifier is: The probability of document „d‟ being in class „c‟ is computed as- where P(tk|c) is the conditional probability of term occurring in a document of class c .Where, 2612/3/2012
  • 30. • Organizing data into classes such that there is Inter-clusters distance  maximized Intra-clusters distance  minimized • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects. What is Unsupervised Learning….? • Unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data • Sometimes it is also referred as Clustering 3012/3/2012
  • 31. What is a natural grouping among these objects? 3112/3/2012
  • 32. School EmployeesSimpson's Family MalesFemales Clustering is subjective What is a natural grouping among these objects? 3212/3/2012
  • 33. What is clustering for….? Let us see some real-life examples • Example 1: Groups people of similar sizes together to make “small”, “medium” and “large” T-Shirts. – Tailor-made for each person: too expensive – One-size-fits-all: does not fit all. • Example 2: Given a collection of text documents, we want to organize them according to their content similarities, – To produce a topic hierarchy 12/3/2012 33
  • 34. What is clustering for? (cont…) In fact, clustering is one of the most utilized data mining techniques – It has a long history, and used in almost every field, e.g., medicine, psychology, botany, sociology, biology, archeology, marketing, insurance, libraries, etc. – In recent years, due to the rapid increase of online documents, text clustering becomes important. 12/3/2012 34
  • 37. 37 An example (cont …) 12/3/2012
  • 39. Supervised Learning versus Unsupervised Learning • Unsupervised clustering Group similar objects together to find clusters • Minimize intra-class distance • Maximize inter-class distance • Supervised classification Class label for each training sample is given – Build a model from the training data – Predict class label on unseen future data points 3912/3/2012
  • 40. However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. Speech Images Medical outcomes Customer modeling Protein sequences Web pages Need to pay someone to do it, requires special testing,… 4012/3/2012
  • 41. Why Semi-Supervised Learning…? • Why not clustering? – The clusters produced may not be the ones required. – Sometimes there are multiple possible groupings. • Why not classification? – Sometimes there are insufficient labeled data. 4112/3/2012
  • 42. Semi-Supervised Learning • Combines labeled and unlabeled data during training to improve performance: – Semi-supervised classification: Training on labeled data exploits additional unlabeled data, frequently resulting in a more accurate classifier. – Semi-supervised clustering: Uses small amount of labeled data to aid and bias the clustering of unlabeled data. Unsupervised clustering Semi-supervised learning Supervised classification 4212/3/2012
  • 43. Semi-Supervised Classification • An initial classifier is designed using the labeled data set D(l). This classifier is then used to assign class labels to examples in D(u). Then the classifier is re-trained using D(l) U D(u). • The last two steps are usually repeated for a given number of times or until some criterion is satisfied 4312/3/2012
  • 46. Semi-Supervised Classification • Algorithms: – Semisupervised EM [Ghahramani:NIPS94,Nigam:ML00]. – Co-training [Blum:COLT98]. – Transductive SVM‟s [Vapnik:98,Joachims:ICML99]. – Graph based algorithms • Assumptions: – Known, fixed set of categories given in the labeled data. – Goal is to improve classification of examples into these known categories. 4612/3/2012
  • 47. Semi-Supervised clustering • Input: – A set of unlabeled objects, each described by a set of attributes (numeric and/or categorical) – A small amount of domain knowledge • Output: – A partitioning of the objects into k clusters (possibly with some discarded as outliers) • Objective: – Maximum intra-cluster similarity – Minimum inter-cluster similarity – High consistency between the partitioning and the domain knowledge 4712/3/2012
  • 48. How Semi-Supervised Clustering done? • In addition to the similarity information used by unsupervised clustering, in many cases a small amount of knowledge is available concerning either pairwise (must-link or cannot-link) constraints between data items or class labels for some items. • Instead of simply using this knowledge for the external validation of the results of clustering, one can imagine letting it “guide” or “adjust” the clustering process, i.e. provide a limited form of supervision. The resulting approach is called semi-supervised clustering 4812/3/2012
  • 51. • According to different given domain knowledge: – Users provide class labels (seeded points) a priori to some of the documents -Users know about which few documents are related (must-link) or unrelated (cannot-link) Semi-Supervised Clustering Seeded points Must-link Cannot-link 5112/3/2012
  • 52. Semi-supervised Clustering Algorithm • Semi-supervised Clustering with labels (Partial label information is given ) : – SS-Seeded-Kmeans ( Sugato Basu, et al. ICML 2002) - SS-Constraint-Kmeans ( Sugato Basu, et al. ICML 2002) • Semi-supervised Clustering with Constraints (Pairwise Constraints (Must-link, Cannot-link) is given): – SS-COP-Kmeans (Wagstaff et al. ICML01) – SS-HMRF-Kmeans (Sugato Basu, et al. ACM SIGKDD 2004) – SS-Kernel-Kmeans (Brian Kulis, et al. ICML 2005) – SS-Spectral-Normalized-Cuts (X. Ji, et al. ACM SIGIR 2006) 5212/3/2012
  • 54. Conclusions • Semi-supervised learning is an area of increasing importance in Machine Learning. • Automatic methods of collecting data make it more important than ever to develop methods to make use of unlabeled data. • Several promising algorithms (only discussed a few). Also new theoretical framework to help guide further development. 5412/3/2012
  • 55. Reference • Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley & Sons, New York, 1982. (2nd edition 2000). • Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990. • Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition , Pattern Recognition ,Elsevier(USA)) ,1982 • K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86{93. ACM, 2000. • http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text- classification-1.html • http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.htm l 12/3/2012 55