Machine Learning Algorithms Explained

MACHINE LEARNING
1
1. ABSTRACT
Machine Learning is a scientific discipline that addresses the following question:
‘How can we program systems to automatically learn and to improve with experience?
’Learning in this context is not learning by heart but recognizing complex patterns and make
intelligent decisions based on data. The difficulty lies in the fact that the set of all possible
decisions given all possible inputs is too complex to describe. To tackle this problem the field
of Machine Learning develops algorithms that discover knowledge from specific data and
experience, based on sound statistical and computational principles.
The field of Machine Learning integrates many distinct approaches such as
probability theory, logic, combinatorial optimization, search, statistics, reinforcement
learning and control theory. The developed methods are at the basis of many applications,
ranging from vision to language processing, forecasting, pattern recognition, games, data
mining, expert systems and robotics.

MACHINE LEARNING
2
2. INTRODUCTION
Machine learning is a method of data analysis that automates analytical model
building. Machine learning is a type of artificial intelligence (AI) that enables software
applications to become more accurate in forecasting outcomes without being specially
programmed.
The main idea of machine learning is to create algorithms that can receive input data and
use statistical analysis to predict an output value within an acceptable range.
The processes involved in machine learning are like data mining and predictive modeling.
They require searching through data to look for patterns and adjusting program actions
appropriately.
Basically we have,
 Deep Learning
 Representation Learning
 Machine Learning
 Artificial Intelligence
DEEP LEARNING
Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has networks
which are capable of learning unsupervised from data that is unstructured or unlabeled. Also
known as Deep Neural Learning or Deep Neural Network.
Ex: MLP’s (Multilayer Perceptron)
REPRESENTATION OR FEATURE LEARNING
In machine learning, feature learning or representation learning is a set of techniques that
allows a system to automatically discover the representations needed for feature detection or
classification from raw data.
Ex: Shallow encoders

MACHINE LEARNING
3
MACHINE LEARNING
Machine learning is a field of computer science that gives computers the ability to learn
without being explicitly programmed.
Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it learn for themselves.
Ex: Logistic regression
ARTIFICIAL INTELLIGENCE
Artificial Intelligence is a field of study that encompasses computational techniques for
performing tasks that apparently require intelligence when performed by humans.
Ex: Knowledge bases

MACHINE LEARNING
4
3. MACHINE LEARNING PROCESS
The machine learning process can be done in following steps
 Identifies relevant data sets and prepares them for analysis
 Chooses the type of machine learning algorithm to use
 Builds an analytical model based on chosen algorithm
 Trains the model on test data sets, revising it as needed
 Runs the model o generate scores and other findings
Fig 2.1: Machine learning process

MACHINE LEARNING
5
4. OVERVIEW
Machine Learning is the only kind of AI there is.
AI is changing. We are now recognizing that most things called "AI" in the past are nothing
more than advanced programming tricks. As long as the programmer is the one supplying all
the intelligence to the system by programming it in as a World Model, the system is not really
an Artificial Intelligence. It's "just a program".
Don't model the World; Model the Mind.
When you Model the Mind you can create systems capable of Learning everything about the
world. It is a much smaller task, since the world is very large and changes behind your back,
which means World Models, will become obsolete the moment they are made. The only hope
to create intelligent systems is to have the system itself create and maintain its own World
Models. Continuously, in response to sensory input.
Machine learning is a subset of AI. That is, all machine learning counts as AI, but not all AI
counts as machine learning. For example, symbolic logic (rules engines, expert systems and
knowledge graphs) as well as evolutionary algorithms and Bayesian statistics could all be
described as AI, and none of them are machine learning.
The "learning" part of machine learning means that ML algorithms attempt to optimize along
a certain dimension; i.e. they usually try to minimize error or maximize the likelihood of their
predictions being true. How does one minimize error? Well, one way is to build a framework
that multiplies inputs in order to make guesses as to the inputs' nature. Different
outputs/guesses are the product of the inputs and the algorithm. Usually, the initial guesses
are quite wrong, and if you are lucky enough to have ground-truth labels pertaining to the
input, you can measure how wrong your guesses are by contrasting them with the truth, and
then use that error to modify your algorithm. That's what neural networks do. They keep on
measuring the error and modifying their parameters until they can't achieve any less error.
They are, in short, an optimization algorithm. If you tune them right, they minimize their
error by guessing and guessing and guessing again.
Neural networks are part of machine learning.

MACHINE LEARNING
6
Introduction to Deep Neural Networks: Neural networks are a set of algorithms, modeled
loosely after the human brain, that are designed to recognize patterns. They interpret sensory
data through a kind of machine perception, labeling or clustering raw input. The patterns they
recognize are numerical, contained in vectors, into which all real-world data, be it images,
sound, text or time series, must be translated.
According to me, Artificial Intelligence (AI) and Machine Learning (ML) go hand-in-hand
with big data. To make really intelligent machines, you need huge amounts of data for them
to learn from. Similarly, to understand huge amounts of data, you need the help of intelligent
machines.
AI VS ML
Machine Learning is a subfield of computer science that focuses on enabling computers to
make accurate predictions on any type of data. So instead of explicitly telling a computer how
to solve a problem, you show it how it was previously solved and the computer
identifies/learns on its own on all the steps that were part of the solution.
Artificial intelligence, on the other hand, is a much broader concept that stems from the idea
that human intelligence "can be so precisely described that a machine can be made to
simulate it”. It means that instead of just learning from a set of data, computers will treat that
dataset as knowledge, use it for planning, communicate that plan with humans or other AI,
and move/manipulate real world objects to execute that plan, all on their own.

MACHINE LEARNING
7
5. TYPES OF MACHINE LEARNING ALGORITHMS
Machine learning algorithms can be divided into 3 broad categories
 Supervised learning
 Unsupervised learning
 Reinforcement learning
SUPERVISED LEARNING
 In supervised learning, an input vector is applied to the network and it results in an
output vector. This result is compared with the target response. It generates a function
that maps input to desired outputs.
 Supervised learning is useful in cases where a property (label) is available for a
certain dataset (training set), but is missing and needs to be predicted for other
instances.
UNSUPERVISED LEARNING
 In unsupervised learning, the input vectors of similar types are grouped without the
use of learning data that specify how a typical member of each group looks or to
which group member belongs.
 Unsupervised learning is useful in cases where the challenge is to discover implicit
relationships in a given unlabeled dataset (items are not pre-assigned).
REINFORCEMENT LEARNING
 Reinforcement learning falls between these 2 extremes — there is some form of
feedback available for each predictive step or action, but no precise label or error
message.
 Reinforcement Learning allows the machine or software agent to learn its behaviour
based on feedback from the environment.
 These algorithms choose an action, based on each data point and later learn how good
the decision was. Over time, the algorithm changes its strategy to learn better and
achieve the best reward.

MACHINE LEARNING
8
6. SUPERVISED LEARNING
Machine learning algorithms that make predictions on given set of samples. Supervised
machine learning algorithm searches for patterns within the value labels assigned to data
points.
Supervised learning algorithm consists of
 Decision tree
 Naive Bayes Classification
 Ordinary Least Squares Regression
 Logistic Regression
 Linear Regression
 Support Vector Machines
 Ensemble Methods
1. DECISION TREES
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance-event outcomes, resource costs, and utility.
A decision tree is a graphical representation that makes use of branching methodology to
exemplify all possible outcomes of a decision, based on certain conditions. In a decision tree,
the internal node represents a test on the attribute, each branch of the tree represents the
outcome of the test and the leaf node represents a particular class label i.e. the decision made
after computing all of the attributes. The classification rules are represented through the path
from root to the leaf node.
Types of Decision Trees
Classification Trees- These are considered as the default kind of decision trees used to
separate a dataset into different classes, based on the response variable. These are generally
used when the response variable is categorical in nature.
Regression Trees-When the response or target variable is continuous or numerical,
regression trees are used. These are generally used in predictive type of problems when
compared to classification.

MACHINE LEARNING
9
Why should you use Decision Tree Machine Learning algorithm?
These machine learning algorithms help make decisions under uncertainty and help you
improve communication, as they present a visual representation of a decision situation.
 Decision tree machine learning algorithms help a data scientist capture the idea that if
a different decision was taken, then how the operational nature of a situation or model
would have changed intensely.
 Decision tree algorithms help make optimal decisions by allowing a data scientist to
traverse through forward and backward calculation paths.
When to use Decision Tree Machine Learning Algorithm
 Decision trees are robust to errors and if the training data contains errors- decision
tree algorithms will be best suited to address such problems.
 Decision trees are best suited for problems where instances are represented by
attribute value pairs.
 If the training data has missing value then decision trees can be used, as they can
handle missing values nicely by looking at the data in other columns.
 Decision trees are best suited when the target function has discrete output values.
Advantages of Using Decision Tree Machine Learning Algorithms
 Decision trees are very instinctual and can be explained to anyone with ease. People
from a non-technical background, can also decipher the hypothesis drawn from a
decision tree, as they are self-explanatory.
 When using decision tree machine learning algorithms, data type is not a constraint as
they can handle both categorical and numerical variables.
 Decision tree machine learning algorithms do not require making any assumption on
the linearity in the data and hence can be used in circumstances where the parameters
are non-linearly related. These machine learning algorithms do not make any
assumptions on the classifier structure and space distribution.
 These algorithms are useful in data exploration. Decision trees implicitly perform
feature selection which is very important in predictive analytics. When a decision tree
is fit to a training dataset, the nodes at the top on which the decision tree is split, are

MACHINE LEARNING
10
considered as important variables within a given dataset and feature selection is
completed by default.
 Decision trees help save data preparation time, as they are not sensitive to missing
values and outliers. Missing values will not stop you from splitting the data for
building a decision tree. Outliers will also not affect the decision trees as data splitting
happens based on some samples within the split range and not on exact absolute
values.
Drawbacks of Using Decision Tree Machine Learning Algorithms
 The more the number of decisions in a tree, less is the accuracy of any expected
outcome.
 A major drawback of decision tree machine learning algorithms, is that the outcomes
may be based on expectations. When decisions are made in real-time, the payoffs and
resulting outcomes might not be the same as expected or planned. There are chances
that this could lead to unrealistic decision trees leading to bad decision making. Any
irrational expectations could lead to major errors and flaws in decision tree analysis,
as it is not always possible to plan for all eventualities that can arise from a decision.
 Decision Trees do not fit well for continuous variables and result in instability and
classification plateaus.
 Decision trees are easy to use when compared to other decision making models but
creating large decision trees that contain several branches is a complex and time
consuming task.
 Decision tree machine learning algorithms consider only one attribute at a time and
might not be best suited for actual data in the decision space.
 Large sized decision trees with multiple branches are not comprehensible and pose
several presentation difficulties.
Applications of Decision Tree Machine Learning Algorithm
 Decision trees are among the popular machine learning algorithms that find great use
in finance for option pricing.
 Remote sensing is an application area for pattern recognition based on decision trees.
 Decision tree algorithms are used by banks to classify loan applicants by their
probability of defaulting payments.

MACHINE LEARNING
11
 Gerber Products, a popular baby product company, used decision tree machine
learning algorithm to decide whether they should continue using the plastic PVC
(Poly Vinyl Chloride) in their products.
 Rush University Medical Centre has developed a tool named Guardian that uses a
decision tree machine learning algorithm to identify at-risk patients and disease
trends.
Take a look at the image to get a sense of how it looks like.

MACHINE LEARNING
12
Fig 5.1: Decision tree
Decision Tree Example:
From a business decision point of view, a decision tree is the minimum number of yes/no
questions that one has to ask, to assess the probability of making a correct decision, most of
the time. As a method, it allows you to approach the problem in a structured and systematic
way to arrive at a logical conclusion.
2. NAIVE BAYES CLASSIFICATION
Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying
Bayes theorem with strong (naive) independence assumptions between the features.
Fig 5. 2: Naive Bayes Classification

MACHINE LEARNING
13
The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is
likelihood, P(A) is class prior probability, and P(B) is predictor prior probability.
Example of Stack Overflow thread:
 We have a training dataset of 1,000 fruits.
 The fruit can be a Banana, Orange or Other (these are the classes).
 The fruit can be Long, Sweet or Yellow (these are the features).
What do you see in this training dataset?
 Out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow.
 Out of 300 oranges, none are long, 150 are sweet and 300 are yellow.
 Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow.
If we are given the length, sweetness and color of a fruit (without knowing its class), we can
now calculate the probability of it being a banana, orange or other fruit.
Suppose we are told the unknown fruit is long, sweet and yellow.
Here’s how we calculate all the probabilities in 4 steps:
Step 1: To calculate the probability the fruit is a banana, let’s first recognize that this looks
familiar. It’s the probability of the class Banana given the features Long, Sweet and Yellow
or more succinctly:

This is exactly like the equation discussed earlier.
Step 2: Starting with the numerator, let’s plug everything in.


MACHINE LEARNING
14



Multiplying everything together (as in the equation), we get:
Step 3: Ignore the denominator, since it’ll be the same for all the other calculations.
Step 4: Do a similar calculation for the other classes:


Since it is greater than, Naive Bayes would classify this long, sweet and yellow fruit as a
banana.
When to use the Machine Learning algorithm - Naïve Bayes Classifier?
 If you have a moderate or large training data set.
 If the instances have several attributes.
 Given the classification parameter, attributes which describe the instances should be
conditionally independent.
Applications of Naïve Bayes Classifier
 Sentiment Analysis- It is used at Facebook to analyse status updates expressing
positive or negative emotions.
 Document Categorization- Google uses document classification to index documents
and find relevancy scores i.e. the PageRank. PageRank mechanism considers the
pages marked as important in the databases that were parsed and classified using a
document classification technique.
 Naïve Bayes Algorithm is also used for classifying news articles about Technology,
Entertainment, Sports, Politics, etc.

MACHINE LEARNING
15
 Email Spam Filtering-Google Mail uses Naïve Bayes algorithm to classify your
emails as Spam or Not Spam
Some of real world examples are:
 To mark an email as spam or not spam
 Classify a news article about technology, politics, or sports
 Check a piece of text expressing positive emotions, or negative emotions?
 Used for face recognition software.
Advantages of the Naïve Bayes Classifier Machine Learning Algorithm
 Naïve Bayes Classifier algorithm performs well when the input variables are
categorical.
 A Naïve Bayes classifier converges faster, requiring relatively little training data than
other discriminative models like logistic regression, when the Naïve Bayes
conditional independence assumption holds.
 With Naïve Bayes Classifier algorithm, it is easier to predict class of the test data set.
A good bet for multi class predictions as well.
 Though it requires conditional independence assumption, Naïve Bayes Classifier has
presented good performance in various application domains.
3. ORDINARY LEAST SQUARES REGRESSION
If you know statistics, you probably have heard of linear regression before. Least squares are
a method for performing linear regression. You can think of linear regression as the task of
fitting a straight line through a set of points.
There are multiple possible strategies to do this, and “ordinary least squares” strategy go like
this — You can draw a line, and then for each of the data points, measure the vertical
distance between the point and the line, and add these up; the fitted line would be the one
where this sum of distances is as small as possible.

MACHINE LEARNING
16
Fig 5. 3: Ordinary Least Squares Regression
Linear refers the kind of model you are using to fit the data, while least square refers to the
kind of error metric you are minimizing over.
4. LOGISTIC REGRESSION
Logistic regression is a powerful statistical way of modeling a binomial outcome with one or
more explanatory variables. It measures the relationship between the categorical dependent
variable and one or more independent variables by estimating probabilities using a logistic
function, which is the cumulative logistic distribution.
In general, regressions can be used in real-world applications such as:
 Credit Scoring
 Measuring the success rates of marketing campaigns
 Predicting the revenues of a certain product
 Is there going to be an earthquake on a particular day?
Logistic Regression
The name of this algorithm could be a little confusing in the sense that Logistic Regression
machine learning algorithm is for classification tasks and not regression problems. The name
‘Regression’ here implies that a linear model is fit into the feature space. This algorithm

MACHINE LEARNING
17
applies a logistic function to a linear combination of features to predict the outcome of a
categorical dependent variable based on predictor variables.
The odds or probabilities that describe the outcome of a single trial are modelled as a function
of explanatory variables. Logistic regression algorithms helps estimate the probability of
falling into a specific level of the categorical dependent variable based on the given predictor
variables.
Just suppose that you want to predict if there will be a snowfall tomorrow in New York. Here
the outcome of the prediction is not a continuous number because there will either be
snowfall or no snowfall and hence linear regression cannot be applied. Here the outcome
variable is one of the several categories and using logistic regression helps.
Based on the nature of categorical response, logistic regression is classified into 3 types –
 Binary Logistic Regression – The most commonly used logistic regression when the
categorical response has 2 possible outcomes i.e. either yes or not. Example –
Predicting whether a student will pass or fail an exam, predicting whether a student
will have low or high blood pressure, predicting whether a tumour is cancerous or not.
 Multi-nominal Logistic Regression - Categorical response has 3 or more possible
outcomes with no ordering. Example- Predicting what kind of search engine (Yahoo,
Bing, Google, and MSN) is used by majority of US citizens.
 Ordinal Logistic Regression - Categorical response has 3 or more possible outcomes
with natural ordering. Example- How a customer rates the service and quality of food
at a restaurant based on a scale of 1 to 10.
Let us consider a simple example where a cake manufacturer wants to find out if baking a
cake at 160°C, 180°C and 200°C will produce a ‘hard’ or ‘soft’ variety of cake ( assuming
the fact that the bakery sells both the varieties of cake with different names and prices).
Logistic regression is a perfect fit in this scenario instead of other statistical techniques. For
example, if the manufactures produces 2 cake batches wherein the first batch contains 20
cakes (of which 7 were hard and 13 were soft ) and the second batch of cake produced
consisted of 80 cakes (of which 41 were hard and 39 were soft cakes). Here in this case if
linear regression algorithm is used it will give equal importance both the batches of cakes
regardless of the number of cakes in each batch. Applying a logistic regression algorithm will
consider this factor and give the second batch of cakes more weightage than the first batch.

MACHINE LEARNING
18
When to Use Logistic Regression Machine Learning Algorithm
 Use logistic regression algorithms when there is a requirement to model the
probabilities of the response variable as a function of some other explanatory variable.
For example, probability of buying a product X as a function of gender
 Use logistic regression algorithms when there is a need to predict probabilities that
categorical dependent variable will fall into two categories of the binary response as a
function of some explanatory variables. For example, what is the probability that a
customer will buy a perfume given that the customer is a female?
 A logistic regression algorithm is also best suited when the need is to classify
elements two categories based on the explanatory variable. For example-classify
females into ‘young’ or ‘old’ group based on their age.
Advantages of Using Logistic Regression
 Easier to inspect and less complex.
 Robust algorithm as the independent variables need not have equal variance or normal
distribution.
 These algorithms do not assume a linear relationship between the dependent and
independent variables and hence can also handle non-linear effects.
 Controls confounding and tests interaction.
5. Linear Regression Machine Learning Algorithm
Linear Regression algorithm shows the relationship between 2 variables and how the change
in one variable impacts the other. The algorithm shows the impact on the dependent variable
on changing the independent variable. The independent variables are referred as explanatory
variables, as they explain the factors the impact the dependent variable. Dependent variable is
often referred to as the factor of interest or predictor.
Advantages of Linear Regression Machine Learning Algorithm
 It is one of the most interpretable machine learning algorithms, making it easy to
explain to others.
 It is easy of use as it requires minimal tuning.
 It is the mostly widely used machine learning technique that runs fast.

MACHINE LEARNING
19
Applications of Linear Regression
 Estimating Sales: Linear Regression finds great use in business, for sales forecasting
based on the trends. If a company observes steady increase in sales every month - a
linear regression analysis of the monthly sales data helps the company forecast sales
in upcoming months.
 Risk Assessment: Linear Regression helps assess risk involved in insurance or
financial domain. A health insurance company can do a linear regression analysis on
the number of claims per customer against age. This analysis helps insurance
companies find, that older customers tend to make more insurance claims. Such
analysis results play a vital role in important business decisions and are made to
account for risk.
6. SUPPORT VECTOR MACHINES
SVM is binary classification algorithm. Given a set of points of 2 types in N dimensional
place, SVM generates a (N — 1) dimensional hyperplane to separate those points into 2
groups. Say you have some points of 2 types in a paper which are linearly separable. SVM
will find a straight line which separates those points into 2 types and situated as far as possible
from all those points.
Fig 5.6: Support Vector Machine

MACHINE LEARNING
20
In terms of scale, some of the biggest problems that have been solved using SVMs (with
suitably modified implementations) are display advertising, human splice site recognition,
image-based gender detection, large-scale image classification
Support Vector Machine is a supervised machine learning algorithm for classification or
regression problems where the dataset teaches SVM about the classes so that SVM can
classify any new data. It works by classifying the data into different classes by finding a line
(hyperplane) which separates the training data set into classes. As there are many such linear
hyperplanes, SVM algorithm tries to maximize the distance between the various classes that
are involved and this is referred as margin maximization. If the line that maximizes the
distance between the classes is identified, the probability to generalize well to unseen data is
increased.
Types of SVM:
SVM’s are classified into two categories:
 Linear SVM’s – In linear SVM’s the training data i.e. classifiers are separated by a
hyperplane.
 Non-Linear SVM’s- In non-linear SVM’s it is not possible to separate the training
data using a hyperplane. For example, the training data for Face detection consists of
group of images that are faces and another group of images that are not faces (in other
words all other images in the world except faces). Under such conditions, the training
data is too complex that it is impossible to find a representation for every feature
vector. Separating the set of faces linearly from the set of non-face is a complex task.
Advantages of Using SVM
 SVM offers best classification performance (accuracy) on the training data.
 SVM renders more efficiency for correct classification of the future data.
 The best thing about SVM is that it does not make any strong assumptions on data.
 It does not over-fit the data.
Applications of Support Vector Machine
SVM is commonly used for stock market forecasting by various financial institutions. For
instance, it can be used to compare the relative performance of the stocks when compared to

MACHINE LEARNING
21
performance of other stocks in the same sector. The relative comparison of stocks helps
manage investment making decisions based on the classifications made by the SVM learning
algorithm.
7. ENSEMBLE METHODS
Ensemble methods are learning algorithms that construct a set of classifiers and then classify
new data points by taking a weighted vote of their predictions. The original ensemble method
is Bayesian averaging, but more recent algorithms include error-correcting output coding,
bagging, and boosting.
Ensemble Learning Algorithms
So how do ensemble methods work and why are they superior to individual models?
 They average out biases: If you average a bunch of democratic-leaning polls and
republican-leaning polls together, you will get an average something that isn’t leaning
either way.
 They reduce the variance: The aggregate opinion of a bunch of models is less noisy
than the single opinion of one of the models. In finance, this is called diversification
— a mixed portfolio of many stocks will be much less variable than just one of the
stocks alone. This is why your models will be better with more data points rather than
fewer.
 They are unlikely to over-fit: If you have individual models that didn’t over-fit, and
you are combining the predictions from each model in a simple way (average,
weighted average, logistic regression), then there’s no room for over-fitting.

MACHINE LEARNING
22
7. UNSUPERVISED LEARNING
There are no labels associated with data points. These machine learning algorithms organize
the data into a group of clusters to describe its structure and make complex data look simple
and organized for analysis.
Unsupervised learning algorithm consists of
 Clustering Algorithm
 K Means Clustering Algorithm
 Apriori Algorithm
 Principal Component Analysis
 Singular Value Decomposition
1. CLUSTERING ALGORITHMS
Clustering is the task of grouping a set of objects such that objects in the same group (cluster)
are more similar to each other than to those in other groups.
Fig 6.1: Clustering Algorithms

MACHINE LEARNING
23
2. K MEANS CLUSTERING ALGORITHM
K-means is a popularly used unsupervised machine learning algorithm for cluster analysis. K-
Means is a non-deterministic and iterative method. The algorithm operates on a given data set
through pre-defined number of clusters, k. The output of K Means algorithm is k clusters
with input data partitioned among the clusters.
For instance, let’s consider K-Means Clustering for Wikipedia Search results. The search
term “Jaguar” on Wikipedia will return all pages containing the word Jaguar which can refer
to Jaguar as a Car, Jaguar as Mac OS version and Jaguar as an Animal. K Means clustering
algorithm can be applied to group the WebPages that talk about similar concepts. So, the
algorithm will group all web pages that talk about Jaguar as an Animal into one cluster,
Jaguar as a Car into another cluster and so on.
Advantages of using K-Means Clustering Machine Learning Algorithm
 In case of globular clusters, K-Means produces tighter clusters than hierarchical
clustering.
 Given a smaller value of K, K-Means clustering computes faster than hierarchical
clustering for large number of variables.
Applications of K-Means Clustering
K Means Clustering algorithm is used by most of the search engines like Yahoo, Google to
cluster web pages by similarity and identify the ‘relevance rate’ of search results. This helps
search engines reduce the computational time for the users.
3. APRIORI MACHINE LEARNING ALGORITHM
Apriori algorithm is an unsupervised machine learning algorithm that generates association
rules from a given data set. Association rule implies that if an item A occurs, then item B also
occurs with a certain probability. Most of the association rules generated are in the IF_THEN
format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it.
For the algorithm to derive such conclusions, it first observes the number of people who
bought an iPad case while purchasing an iPad. This way a ratio is derived like out of the 100
people who purchased an iPad, 85 people also purchased an iPad case.

MACHINE LEARNING
24
Basic principle on which Apriori Machine Learning Algorithm works:
 If an item set occurs frequently then all the subsets of the item set, also occur
frequently.
 If an item set occurs infrequently then all the supersets of the item set have infrequent
occurrence.
Advantages of Apriori Algorithm
 It is easy to implement and can be parallelized easily.
 Apriori implementation makes use of large item set properties.
Applications of Apriori Algorithm
 Detecting Adverse Drug Reactions: Apriori algorithm is used for association
analysis on healthcare data like-the drugs taken by patients, characteristics of each
patient, adverse ill-effects patients experience, initial diagnosis, etc. This analysis
produces association rules that help identify the combination of patient characteristics
and medications that lead to adverse side effects of the drugs.
 Market Basket Analysis: Many e-commerce giants like Amazon use Apriori to draw
data insights on which products are likely to be purchased together and which are
most responsive to promotion. For example, a retailer might use Apriori to predict
that people who buy sugar and flour are likely to buy eggs to bake a cake.
 Auto-Complete Applications: Google auto-complete is another popular application
of Apriori wherein - when the user types a word, the search engine looks for other
associated words that people usually type after a specific word.
4. PRINCIPAL COMPONENT ANALYSIS
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of
observations of possibly correlated variables into a set of values of linearly uncorrelated
variables called principal components.
Let’s understand it using an example:
Let’s say we have a data set of dimension 300 (n) × 50 (p). n represents the number of
observations and p represents number of predictors. Since we have a large p = 50, there can

MACHINE LEARNING
25
be p(p-1)/2 scatter plots i.e. more than 1000 plots possible to analyze the variable
relationship. Wouldn’t is be a tedious job to perform exploratory analysis on this data?
In this case, it would be a lucid approach to select a subset of p (p << 50) predictor which
captures as much information. Followed by plotting the observation in the resultant low
dimensional space.
The image below shows the transformation of a high dimensional data (3 dimension) to low
dimensional data (2 dimension) using PCA. Not to forget, each resultant dimension is a linear
combination of p features.
Fig 5.4: Principal Component Analysis
Some of the applications of PCA include compression, simplifying data for easier learning,
visualization.
Notice that domain knowledge is very important while choosing whether to go forward with
PCA or not. It is not suitable in cases where data is noisy (all the components of PCA have
quite a high variance).

MACHINE LEARNING
26
5. SINGULAR VALUE DECOMPOSITION
In linear algebra, SVD is a factorization of a real complex matrix. For a given m * n matrix
M, there exists a decomposition such that M = UΣV, where U and V are unitary matrices and
Σ is a diagonal matrix.
PCA is actually a simple application of SVD. In computer vision, the 1st face recognition
algorithms used
PCA and SVD in order to represent faces as a linear combination of “eigenfaces”, do
dimensionality reduction, and then match faces to identities via simple methods; although
modern methods are much more sophisticated, many still depend on similar techniques.

MACHINE LEARNING
27
8. CONCLUSION
These days, machine learning techniques are being widely used to solve real-world problems
by storing, manipulating, extracting and retrieving data from large sources. Supervised
machine learning techniques have been widely adopted however these techniques prove to be
very expensive when the systems are implemented over wide range of data. This is due to the
fact that significant amount of effort and cost is involved because of obtaining large labeled
data sets. Thus active learning provides a way to reduce the labeling costs by labeling only
the most useful instances for learning.

MACHINE LEARNING
28
9. REFERENCES
 https://en.wikipedia.org/wiki/Machine_learning
 https://en.wikipedia.org/wiki/Supervised_learning
 https://en.wikipedia.org/wiki/Unsupervised_learning
 https://en.wikipedia.org/wiki/Reinforcement_learning

Machine Learning Algorithms Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning Algorithms Explained

Similar to Machine Learning Algorithms Explained (20)

Recently uploaded

Recently uploaded (20)

Machine Learning Algorithms Explained