SlideShare uma empresa Scribd logo
1 de 3
Use of classifiers in research problems
Classifiers are algorithms which map the input data to any specific type of output category.
They can be used to build dynamic models with high precision and accuracy such that the
resulting model can be used to predict or classify previously unknown data points. Classifiers
have found wide use in data science applications in various domains. For instance,
classification of a new tumour as malignant or benign, identifying a mail as spam or ham,
marking an insurance claim as possibly fraudulent or genuine are different instances of
classification. Classification algorithms use training data, i.e., they learn from example data
and build a model or procedure to identify a new data point as belonging to a particular
category. Thereby they belong to the class of supervised learning methods.
There are a number of classifiers that can be used to classify data on the basis of historic and
already existing data. A very short description of these methods is given here just to introduce
the concepts.
Logistic Regression
As a simple case, consider a logistic model with two predictors x1 and x2, and one binary
response variable y which we denote as 𝑝 = 𝑃(𝑌 = 1). We assume a linear relationship
between the predictor variables and the log-odds of the event. This relationship can be
expressed as,
log
𝑝
1 − 𝑝
= β + β 𝑥 + β 𝑥
By simple algebraic manipulation, the probability that Y=1 is,
𝑝 =
𝑒
𝑒 + 1
The above formula shows that once the β ′𝑠 are estimated, we can compute the probability that
Y=1 for a given observation, or its complement Y=0.
Decision Trees
In this technique, we split the population or sample into two or more homogeneous sets (or
sub-populations) based on most significant splitter/differentiator in input variables. The end
result of the algorithm would be a tree like structure with root, branch and leaf nodes (target
variable). Decision trees use multiple algorithms to decide to split a node in two or more sub-
nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. Although
several criteria like Gini index, chi-square, reduction in variance are available for identifying
the nodes, one popular measure used for spitting is the information gain. This is equivalent to
selecting a particular node with maximum reduction in entropy as measured by Shannon’s
index (H).
𝐻 = − 𝑝 log 𝑝
where s is the number of groups at a node and 𝑝 indicate the proportion of individuals in the
ith group.
Random Forests
Ensemble learning is a type of supervised learning technique in which the basic idea is to
generate multiple models on a training dataset and then simply combine (average) their output
rules or their hypotheses to generate a stronger model which performs very well. Random forest
is a classic case of ensemble learning. Decision trees are considered very simple and easily
interpretable but a major drawback in them is that they have poor predictive performance and
poor generalization on test set and so sometimes are called weak learners. In the context of
decision trees, random forest is a model based on multiple trees. Rather than just simply
averaging the predictions of individual trees (which we could call a ‘forest’), this model
uses two key concepts that gives it the name ‘random’ viz., (i) random sampling of training
data points when building trees (ii) random subsets of features considered when splitting nodes.
The idea here is that instead of producing a single complicated and complex model which might
have a high variance that will lead to overfitting or might be too simple and have a high bias
which leads to underfitting, we will generate lots of models using the training set and at the
end combine them.
Support Vector Machines
Given a set of training examples, each marked as belonging to one or the other of two categories,
a Support Vector Machine (SVM) training algorithm builds a model that assigns new examples
to one category or the other. In theory, SVM is a discriminative classifier formally defined by a
separating hyperplane. In other words, given labelled training data, the algorithm outputs an
optimal hyperplane which categorizes new examples. Thus, the hyperplanes are decision
boundaries that help classify the data points. Data points falling on either side of the hyperplane
can be attributed to different classes. Also, the dimension of the hyperplane depends upon the
number of features. If the number of input features is 2, then the hyperplane is just a line. If the
number of input features is 3, then the hyperplane becomes a two-dimensional plane. In practice,
there are many hyperplanes that might classify the data. One reasonable choice as the best
hyperplane is the one that represents the largest separation, or margin, between the two classes.
So, we choose the hyperplane such that the distance from it to the nearest data point on each
side is maximized.
Naïve Bayes Classifier
Naive Bayes algorithm, in particular is a logic-based technique which is simple yet so powerful
that it is often known to outperform complex algorithms for very large datasets. The foundation
pillar for naive Bayes algorithm is the Bayes theorem which states that in a sequence of events,
if A is the first event and B is the second event, then P(B/A) is obtained by the expression,
P(B/A) = P(B) * P(A/B) / P(A)
The reason that Naive Bayes algorithm is called naive is not because it is simple (naïve). It is
because the algorithm makes a very strong assumption about the data having features
independent of each other. In other words, it assumes that the presence of one feature in a class
is completely unrelated to the presence of all other features. If this assumption of independence
holds, naive Bayes performs extremely well and often better than other models.
Mathematically,
𝑃(𝑋 , … , 𝑋 /𝑌) = 𝑃(𝑋 /𝑌)
In order to create a classifier model, we find the probability of a given set of inputs for all
possible values of the class variable Y and pick up the output with maximum probability. This
can be expressed as
𝑌 = 𝑎𝑟𝑔𝑢𝑚𝑎𝑥 𝑃(𝑌) 𝑃(𝑋 /𝑌)
Neural Networks
A neural network is a series of algorithms that endeavours to recognize underlying relationships
in a set of data through a process that mimics the way the human brain operates. The basic
computational unit of the brain is a neuron. In comparison, a ‘neuron’ in a neural network also
called a perceptron is a mathematical function that collects and classifies information according
to a specific architecture. The perceptron receives input from some other nodes, or from an
external source and computes an output. Each input has an associated weight (w) which is
assigned on the basis of its relative importance to other inputs. The node applies a nonlinear
function to the weighted sum of its inputs to create the output. The idea is that the synaptic
strengths (the weights w) are revisable based on learning from the training data which in turn
controls the strength of their influence and direction.
The learning happens in two steps: forward propagation and back propagation. In simple words,
forward propagation is making a guess about the answer and back propagation is minimising
the error between the actual answer and guessed answer. The process of updating the input
signals is continued through multiple iterations to arrive at a decision.
K Nearest Neighbour Technique
K-nearest neighbours (KNN) is a simple algorithm that stores all available cases and classifies
new cases based on a similarity measure (e.g., distance functions). A case is classified by a
majority vote of its neighbours meaning the case being assigned to the most common class
amongst its K nearest neighbours measured by a distance function. Below is step by step
procedure to compute K-nearest neighbours.
1. Determine parameter K=number of neighbours to be used.
2. Calculate the distance between the query-instance (item to be identified as belonging
to a preidentified category) and all the training samples.
3. Sort the distance and determine nearest neighbours based on the Kth minimum distance.
4. Gather the category 𝛾 of the nearest neighbours
5. Use simple majority of the category of nearest neighbours as the prediction values of
the query instance.
The most intuitive nearest neighbour type classifier is the 1-nearest neighbour classifier that
assigns a point x to the class of its closest neighbour in the feature space.
Finally, the choice of particular classifier for a given situation would depend on their relative
performance in respect of accuracy, sensitivity and specificity. There are deeper issues
involved in the use of all these techniques and considerable developments have taken place in
both theory and programming related to the topic.
--- Jayaraman

Mais conteúdo relacionado

Mais procurados

Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2uetian12
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3eSAT Journals
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 

Mais procurados (19)

Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Random forest
Random forestRandom forest
Random forest
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Classification
ClassificationClassification
Classification
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Random forest
Random forestRandom forest
Random forest
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 

Semelhante a Classifiers

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methodsimprovemed
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methodsimprovemed
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 

Semelhante a Classifiers (20)

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methods
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methods
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
09 classadvanced
09 classadvanced09 classadvanced
09 classadvanced
 

Mais de Ayurdata

Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributionsAyurdata
 
Health Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveHealth Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveAyurdata
 
Ayur data
Ayur data Ayur data
Ayur data Ayurdata
 
Stat Methods in ayurveda
Stat Methods in ayurvedaStat Methods in ayurveda
Stat Methods in ayurvedaAyurdata
 
Ayurveda colleges and courses
Ayurveda colleges and coursesAyurveda colleges and courses
Ayurveda colleges and coursesAyurdata
 
AyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurdata
 
Advanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAdvanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAyurdata
 
Advanced manual part 4
Advanced manual part 4Advanced manual part 4
Advanced manual part 4Ayurdata
 
Investigation modes in ayurveda
Investigation modes in ayurvedaInvestigation modes in ayurveda
Investigation modes in ayurvedaAyurdata
 
Advanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAdvanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAyurdata
 
Advanced statistical manual part ii
Advanced statistical manual part iiAdvanced statistical manual part ii
Advanced statistical manual part iiAyurdata
 
Advanced statistical manual part i
Advanced statistical manual part iAdvanced statistical manual part i
Advanced statistical manual part iAyurdata
 
Advanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAdvanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAyurdata
 
Ayurveda vs allopathy
Ayurveda vs allopathyAyurveda vs allopathy
Ayurveda vs allopathyAyurdata
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in AyurvedaAyurdata
 
A manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchA manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchAyurdata
 
Ich sample size
Ich sample sizeIch sample size
Ich sample sizeAyurdata
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionAyurdata
 
Ayur data startup
Ayur data startupAyur data startup
Ayur data startupAyurdata
 

Mais de Ayurdata (20)

Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
 
BMI
BMIBMI
BMI
 
Health Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveHealth Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda Perspective
 
Ayur data
Ayur data Ayur data
Ayur data
 
Stat Methods in ayurveda
Stat Methods in ayurvedaStat Methods in ayurveda
Stat Methods in ayurveda
 
Ayurveda colleges and courses
Ayurveda colleges and coursesAyurveda colleges and courses
Ayurveda colleges and courses
 
AyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurData Ayurveda Webinar
AyurData Ayurveda Webinar
 
Advanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAdvanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda Research
 
Advanced manual part 4
Advanced manual part 4Advanced manual part 4
Advanced manual part 4
 
Investigation modes in ayurveda
Investigation modes in ayurvedaInvestigation modes in ayurveda
Investigation modes in ayurveda
 
Advanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAdvanced Statistical Manual Part III
Advanced Statistical Manual Part III
 
Advanced statistical manual part ii
Advanced statistical manual part iiAdvanced statistical manual part ii
Advanced statistical manual part ii
 
Advanced statistical manual part i
Advanced statistical manual part iAdvanced statistical manual part i
Advanced statistical manual part i
 
Advanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAdvanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sample
 
Ayurveda vs allopathy
Ayurveda vs allopathyAyurveda vs allopathy
Ayurveda vs allopathy
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in Ayurveda
 
A manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchA manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda research
 
Ich sample size
Ich sample sizeIch sample size
Ich sample size
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Ayur data startup
Ayur data startupAyur data startup
Ayur data startup
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Classifiers

  • 1. Use of classifiers in research problems Classifiers are algorithms which map the input data to any specific type of output category. They can be used to build dynamic models with high precision and accuracy such that the resulting model can be used to predict or classify previously unknown data points. Classifiers have found wide use in data science applications in various domains. For instance, classification of a new tumour as malignant or benign, identifying a mail as spam or ham, marking an insurance claim as possibly fraudulent or genuine are different instances of classification. Classification algorithms use training data, i.e., they learn from example data and build a model or procedure to identify a new data point as belonging to a particular category. Thereby they belong to the class of supervised learning methods. There are a number of classifiers that can be used to classify data on the basis of historic and already existing data. A very short description of these methods is given here just to introduce the concepts. Logistic Regression As a simple case, consider a logistic model with two predictors x1 and x2, and one binary response variable y which we denote as 𝑝 = 𝑃(𝑌 = 1). We assume a linear relationship between the predictor variables and the log-odds of the event. This relationship can be expressed as, log 𝑝 1 − 𝑝 = β + β 𝑥 + β 𝑥 By simple algebraic manipulation, the probability that Y=1 is, 𝑝 = 𝑒 𝑒 + 1 The above formula shows that once the β ′𝑠 are estimated, we can compute the probability that Y=1 for a given observation, or its complement Y=0. Decision Trees In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter/differentiator in input variables. The end result of the algorithm would be a tree like structure with root, branch and leaf nodes (target variable). Decision trees use multiple algorithms to decide to split a node in two or more sub- nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. Although several criteria like Gini index, chi-square, reduction in variance are available for identifying the nodes, one popular measure used for spitting is the information gain. This is equivalent to selecting a particular node with maximum reduction in entropy as measured by Shannon’s index (H). 𝐻 = − 𝑝 log 𝑝 where s is the number of groups at a node and 𝑝 indicate the proportion of individuals in the ith group.
  • 2. Random Forests Ensemble learning is a type of supervised learning technique in which the basic idea is to generate multiple models on a training dataset and then simply combine (average) their output rules or their hypotheses to generate a stronger model which performs very well. Random forest is a classic case of ensemble learning. Decision trees are considered very simple and easily interpretable but a major drawback in them is that they have poor predictive performance and poor generalization on test set and so sometimes are called weak learners. In the context of decision trees, random forest is a model based on multiple trees. Rather than just simply averaging the predictions of individual trees (which we could call a ‘forest’), this model uses two key concepts that gives it the name ‘random’ viz., (i) random sampling of training data points when building trees (ii) random subsets of features considered when splitting nodes. The idea here is that instead of producing a single complicated and complex model which might have a high variance that will lead to overfitting or might be too simple and have a high bias which leads to underfitting, we will generate lots of models using the training set and at the end combine them. Support Vector Machines Given a set of training examples, each marked as belonging to one or the other of two categories, a Support Vector Machine (SVM) training algorithm builds a model that assigns new examples to one category or the other. In theory, SVM is a discriminative classifier formally defined by a separating hyperplane. In other words, given labelled training data, the algorithm outputs an optimal hyperplane which categorizes new examples. Thus, the hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. In practice, there are many hyperplanes that might classify the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So, we choose the hyperplane such that the distance from it to the nearest data point on each side is maximized. Naïve Bayes Classifier Naive Bayes algorithm, in particular is a logic-based technique which is simple yet so powerful that it is often known to outperform complex algorithms for very large datasets. The foundation pillar for naive Bayes algorithm is the Bayes theorem which states that in a sequence of events, if A is the first event and B is the second event, then P(B/A) is obtained by the expression, P(B/A) = P(B) * P(A/B) / P(A) The reason that Naive Bayes algorithm is called naive is not because it is simple (naïve). It is because the algorithm makes a very strong assumption about the data having features independent of each other. In other words, it assumes that the presence of one feature in a class is completely unrelated to the presence of all other features. If this assumption of independence holds, naive Bayes performs extremely well and often better than other models. Mathematically,
  • 3. 𝑃(𝑋 , … , 𝑋 /𝑌) = 𝑃(𝑋 /𝑌) In order to create a classifier model, we find the probability of a given set of inputs for all possible values of the class variable Y and pick up the output with maximum probability. This can be expressed as 𝑌 = 𝑎𝑟𝑔𝑢𝑚𝑎𝑥 𝑃(𝑌) 𝑃(𝑋 /𝑌) Neural Networks A neural network is a series of algorithms that endeavours to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. The basic computational unit of the brain is a neuron. In comparison, a ‘neuron’ in a neural network also called a perceptron is a mathematical function that collects and classifies information according to a specific architecture. The perceptron receives input from some other nodes, or from an external source and computes an output. Each input has an associated weight (w) which is assigned on the basis of its relative importance to other inputs. The node applies a nonlinear function to the weighted sum of its inputs to create the output. The idea is that the synaptic strengths (the weights w) are revisable based on learning from the training data which in turn controls the strength of their influence and direction. The learning happens in two steps: forward propagation and back propagation. In simple words, forward propagation is making a guess about the answer and back propagation is minimising the error between the actual answer and guessed answer. The process of updating the input signals is continued through multiple iterations to arrive at a decision. K Nearest Neighbour Technique K-nearest neighbours (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). A case is classified by a majority vote of its neighbours meaning the case being assigned to the most common class amongst its K nearest neighbours measured by a distance function. Below is step by step procedure to compute K-nearest neighbours. 1. Determine parameter K=number of neighbours to be used. 2. Calculate the distance between the query-instance (item to be identified as belonging to a preidentified category) and all the training samples. 3. Sort the distance and determine nearest neighbours based on the Kth minimum distance. 4. Gather the category 𝛾 of the nearest neighbours 5. Use simple majority of the category of nearest neighbours as the prediction values of the query instance. The most intuitive nearest neighbour type classifier is the 1-nearest neighbour classifier that assigns a point x to the class of its closest neighbour in the feature space. Finally, the choice of particular classifier for a given situation would depend on their relative performance in respect of accuracy, sensitivity and specificity. There are deeper issues involved in the use of all these techniques and considerable developments have taken place in both theory and programming related to the topic. --- Jayaraman