SlideShare a Scribd company logo
1 of 28
Download to read offline
MACHINE LEARNING
1
1. ABSTRACT
Machine Learning is a scientific discipline that addresses the following question:
‘How can we program systems to automatically learn and to improve with experience?
’Learning in this context is not learning by heart but recognizing complex patterns and make
intelligent decisions based on data. The difficulty lies in the fact that the set of all possible
decisions given all possible inputs is too complex to describe. To tackle this problem the field
of Machine Learning develops algorithms that discover knowledge from specific data and
experience, based on sound statistical and computational principles.
The field of Machine Learning integrates many distinct approaches such as
probability theory, logic, combinatorial optimization, search, statistics, reinforcement
learning and control theory. The developed methods are at the basis of many applications,
ranging from vision to language processing, forecasting, pattern recognition, games, data
mining, expert systems and robotics.
MACHINE LEARNING
2
2. INTRODUCTION
Machine learning is a method of data analysis that automates analytical model
building. Machine learning is a type of artificial intelligence (AI) that enables software
applications to become more accurate in forecasting outcomes without being specially
programmed.
The main idea of machine learning is to create algorithms that can receive input data and
use statistical analysis to predict an output value within an acceptable range.
The processes involved in machine learning are like data mining and predictive modeling.
They require searching through data to look for patterns and adjusting program actions
appropriately.
Basically we have,
 Deep Learning
 Representation Learning
 Machine Learning
 Artificial Intelligence
DEEP LEARNING
Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has networks
which are capable of learning unsupervised from data that is unstructured or unlabeled. Also
known as Deep Neural Learning or Deep Neural Network.
Ex: MLP’s (Multilayer Perceptron)
REPRESENTATION OR FEATURE LEARNING
In machine learning, feature learning or representation learning is a set of techniques that
allows a system to automatically discover the representations needed for feature detection or
classification from raw data.
Ex: Shallow encoders
MACHINE LEARNING
3
MACHINE LEARNING
Machine learning is a field of computer science that gives computers the ability to learn
without being explicitly programmed.
Machine learning is an application of artificial intelligence (AI) that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it learn for themselves.
Ex: Logistic regression
ARTIFICIAL INTELLIGENCE
Artificial Intelligence is a field of study that encompasses computational techniques for
performing tasks that apparently require intelligence when performed by humans.
Ex: Knowledge bases
MACHINE LEARNING
4
3. MACHINE LEARNING PROCESS
The machine learning process can be done in following steps
 Identifies relevant data sets and prepares them for analysis
 Chooses the type of machine learning algorithm to use
 Builds an analytical model based on chosen algorithm
 Trains the model on test data sets, revising it as needed
 Runs the model o generate scores and other findings
Fig 2.1: Machine learning process
MACHINE LEARNING
5
4. OVERVIEW
Machine Learning is the only kind of AI there is.
AI is changing. We are now recognizing that most things called "AI" in the past are nothing
more than advanced programming tricks. As long as the programmer is the one supplying all
the intelligence to the system by programming it in as a World Model, the system is not really
an Artificial Intelligence. It's "just a program".
Don't model the World; Model the Mind.
When you Model the Mind you can create systems capable of Learning everything about the
world. It is a much smaller task, since the world is very large and changes behind your back,
which means World Models, will become obsolete the moment they are made. The only hope
to create intelligent systems is to have the system itself create and maintain its own World
Models. Continuously, in response to sensory input.
Machine learning is a subset of AI. That is, all machine learning counts as AI, but not all AI
counts as machine learning. For example, symbolic logic (rules engines, expert systems and
knowledge graphs) as well as evolutionary algorithms and Bayesian statistics could all be
described as AI, and none of them are machine learning.
The "learning" part of machine learning means that ML algorithms attempt to optimize along
a certain dimension; i.e. they usually try to minimize error or maximize the likelihood of their
predictions being true. How does one minimize error? Well, one way is to build a framework
that multiplies inputs in order to make guesses as to the inputs' nature. Different
outputs/guesses are the product of the inputs and the algorithm. Usually, the initial guesses
are quite wrong, and if you are lucky enough to have ground-truth labels pertaining to the
input, you can measure how wrong your guesses are by contrasting them with the truth, and
then use that error to modify your algorithm. That's what neural networks do. They keep on
measuring the error and modifying their parameters until they can't achieve any less error.
They are, in short, an optimization algorithm. If you tune them right, they minimize their
error by guessing and guessing and guessing again.
Neural networks are part of machine learning.
MACHINE LEARNING
6
Introduction to Deep Neural Networks: Neural networks are a set of algorithms, modeled
loosely after the human brain, that are designed to recognize patterns. They interpret sensory
data through a kind of machine perception, labeling or clustering raw input. The patterns they
recognize are numerical, contained in vectors, into which all real-world data, be it images,
sound, text or time series, must be translated.
According to me, Artificial Intelligence (AI) and Machine Learning (ML) go hand-in-hand
with big data. To make really intelligent machines, you need huge amounts of data for them
to learn from. Similarly, to understand huge amounts of data, you need the help of intelligent
machines.
AI VS ML
Machine Learning is a subfield of computer science that focuses on enabling computers to
make accurate predictions on any type of data. So instead of explicitly telling a computer how
to solve a problem, you show it how it was previously solved and the computer
identifies/learns on its own on all the steps that were part of the solution.
Artificial intelligence, on the other hand, is a much broader concept that stems from the idea
that human intelligence "can be so precisely described that a machine can be made to
simulate it”. It means that instead of just learning from a set of data, computers will treat that
dataset as knowledge, use it for planning, communicate that plan with humans or other AI,
and move/manipulate real world objects to execute that plan, all on their own.
MACHINE LEARNING
7
5. TYPES OF MACHINE LEARNING ALGORITHMS
Machine learning algorithms can be divided into 3 broad categories
 Supervised learning
 Unsupervised learning
 Reinforcement learning
SUPERVISED LEARNING
 In supervised learning, an input vector is applied to the network and it results in an
output vector. This result is compared with the target response. It generates a function
that maps input to desired outputs.
 Supervised learning is useful in cases where a property (label) is available for a
certain dataset (training set), but is missing and needs to be predicted for other
instances.
UNSUPERVISED LEARNING
 In unsupervised learning, the input vectors of similar types are grouped without the
use of learning data that specify how a typical member of each group looks or to
which group member belongs.
 Unsupervised learning is useful in cases where the challenge is to discover implicit
relationships in a given unlabeled dataset (items are not pre-assigned).
REINFORCEMENT LEARNING
 Reinforcement learning falls between these 2 extremes — there is some form of
feedback available for each predictive step or action, but no precise label or error
message.
 Reinforcement Learning allows the machine or software agent to learn its behaviour
based on feedback from the environment.
 These algorithms choose an action, based on each data point and later learn how good
the decision was. Over time, the algorithm changes its strategy to learn better and
achieve the best reward.
MACHINE LEARNING
8
6. SUPERVISED LEARNING
Machine learning algorithms that make predictions on given set of samples. Supervised
machine learning algorithm searches for patterns within the value labels assigned to data
points.
Supervised learning algorithm consists of
 Decision tree
 Naive Bayes Classification
 Ordinary Least Squares Regression
 Logistic Regression
 Linear Regression
 Support Vector Machines
 Ensemble Methods
1. DECISION TREES
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and
their possible consequences, including chance-event outcomes, resource costs, and utility.
A decision tree is a graphical representation that makes use of branching methodology to
exemplify all possible outcomes of a decision, based on certain conditions. In a decision tree,
the internal node represents a test on the attribute, each branch of the tree represents the
outcome of the test and the leaf node represents a particular class label i.e. the decision made
after computing all of the attributes. The classification rules are represented through the path
from root to the leaf node.
Types of Decision Trees
Classification Trees- These are considered as the default kind of decision trees used to
separate a dataset into different classes, based on the response variable. These are generally
used when the response variable is categorical in nature.
Regression Trees-When the response or target variable is continuous or numerical,
regression trees are used. These are generally used in predictive type of problems when
compared to classification.
MACHINE LEARNING
9
Why should you use Decision Tree Machine Learning algorithm?
These machine learning algorithms help make decisions under uncertainty and help you
improve communication, as they present a visual representation of a decision situation.
 Decision tree machine learning algorithms help a data scientist capture the idea that if
a different decision was taken, then how the operational nature of a situation or model
would have changed intensely.
 Decision tree algorithms help make optimal decisions by allowing a data scientist to
traverse through forward and backward calculation paths.
When to use Decision Tree Machine Learning Algorithm
 Decision trees are robust to errors and if the training data contains errors- decision
tree algorithms will be best suited to address such problems.
 Decision trees are best suited for problems where instances are represented by
attribute value pairs.
 If the training data has missing value then decision trees can be used, as they can
handle missing values nicely by looking at the data in other columns.
 Decision trees are best suited when the target function has discrete output values.
Advantages of Using Decision Tree Machine Learning Algorithms
 Decision trees are very instinctual and can be explained to anyone with ease. People
from a non-technical background, can also decipher the hypothesis drawn from a
decision tree, as they are self-explanatory.
 When using decision tree machine learning algorithms, data type is not a constraint as
they can handle both categorical and numerical variables.
 Decision tree machine learning algorithms do not require making any assumption on
the linearity in the data and hence can be used in circumstances where the parameters
are non-linearly related. These machine learning algorithms do not make any
assumptions on the classifier structure and space distribution.
 These algorithms are useful in data exploration. Decision trees implicitly perform
feature selection which is very important in predictive analytics. When a decision tree
is fit to a training dataset, the nodes at the top on which the decision tree is split, are
MACHINE LEARNING
10
considered as important variables within a given dataset and feature selection is
completed by default.
 Decision trees help save data preparation time, as they are not sensitive to missing
values and outliers. Missing values will not stop you from splitting the data for
building a decision tree. Outliers will also not affect the decision trees as data splitting
happens based on some samples within the split range and not on exact absolute
values.
Drawbacks of Using Decision Tree Machine Learning Algorithms
 The more the number of decisions in a tree, less is the accuracy of any expected
outcome.
 A major drawback of decision tree machine learning algorithms, is that the outcomes
may be based on expectations. When decisions are made in real-time, the payoffs and
resulting outcomes might not be the same as expected or planned. There are chances
that this could lead to unrealistic decision trees leading to bad decision making. Any
irrational expectations could lead to major errors and flaws in decision tree analysis,
as it is not always possible to plan for all eventualities that can arise from a decision.
 Decision Trees do not fit well for continuous variables and result in instability and
classification plateaus.
 Decision trees are easy to use when compared to other decision making models but
creating large decision trees that contain several branches is a complex and time
consuming task.
 Decision tree machine learning algorithms consider only one attribute at a time and
might not be best suited for actual data in the decision space.
 Large sized decision trees with multiple branches are not comprehensible and pose
several presentation difficulties.
Applications of Decision Tree Machine Learning Algorithm
 Decision trees are among the popular machine learning algorithms that find great use
in finance for option pricing.
 Remote sensing is an application area for pattern recognition based on decision trees.
 Decision tree algorithms are used by banks to classify loan applicants by their
probability of defaulting payments.
MACHINE LEARNING
11
 Gerber Products, a popular baby product company, used decision tree machine
learning algorithm to decide whether they should continue using the plastic PVC
(Poly Vinyl Chloride) in their products.
 Rush University Medical Centre has developed a tool named Guardian that uses a
decision tree machine learning algorithm to identify at-risk patients and disease
trends.
Take a look at the image to get a sense of how it looks like.
MACHINE LEARNING
12
Fig 5.1: Decision tree
Decision Tree Example:
From a business decision point of view, a decision tree is the minimum number of yes/no
questions that one has to ask, to assess the probability of making a correct decision, most of
the time. As a method, it allows you to approach the problem in a structured and systematic
way to arrive at a logical conclusion.
2. NAIVE BAYES CLASSIFICATION
Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying
Bayes theorem with strong (naive) independence assumptions between the features.
Fig 5. 2: Naive Bayes Classification
MACHINE LEARNING
13
The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is
likelihood, P(A) is class prior probability, and P(B) is predictor prior probability.
Example of Stack Overflow thread:
 We have a training dataset of 1,000 fruits.
 The fruit can be a Banana, Orange or Other (these are the classes).
 The fruit can be Long, Sweet or Yellow (these are the features).
What do you see in this training dataset?
 Out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow.
 Out of 300 oranges, none are long, 150 are sweet and 300 are yellow.
 Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow.
If we are given the length, sweetness and color of a fruit (without knowing its class), we can
now calculate the probability of it being a banana, orange or other fruit.
Suppose we are told the unknown fruit is long, sweet and yellow.
Here’s how we calculate all the probabilities in 4 steps:
Step 1: To calculate the probability the fruit is a banana, let’s first recognize that this looks
familiar. It’s the probability of the class Banana given the features Long, Sweet and Yellow
or more succinctly:

This is exactly like the equation discussed earlier.
Step 2: Starting with the numerator, let’s plug everything in.

MACHINE LEARNING
14



Multiplying everything together (as in the equation), we get:
Step 3: Ignore the denominator, since it’ll be the same for all the other calculations.
Step 4: Do a similar calculation for the other classes:


Since it is greater than, Naive Bayes would classify this long, sweet and yellow fruit as a
banana.
When to use the Machine Learning algorithm - Naïve Bayes Classifier?
 If you have a moderate or large training data set.
 If the instances have several attributes.
 Given the classification parameter, attributes which describe the instances should be
conditionally independent.
Applications of Naïve Bayes Classifier
 Sentiment Analysis- It is used at Facebook to analyse status updates expressing
positive or negative emotions.
 Document Categorization- Google uses document classification to index documents
and find relevancy scores i.e. the PageRank. PageRank mechanism considers the
pages marked as important in the databases that were parsed and classified using a
document classification technique.
 Naïve Bayes Algorithm is also used for classifying news articles about Technology,
Entertainment, Sports, Politics, etc.
MACHINE LEARNING
15
 Email Spam Filtering-Google Mail uses Naïve Bayes algorithm to classify your
emails as Spam or Not Spam
Some of real world examples are:
 To mark an email as spam or not spam
 Classify a news article about technology, politics, or sports
 Check a piece of text expressing positive emotions, or negative emotions?
 Used for face recognition software.
Advantages of the Naïve Bayes Classifier Machine Learning Algorithm
 Naïve Bayes Classifier algorithm performs well when the input variables are
categorical.
 A Naïve Bayes classifier converges faster, requiring relatively little training data than
other discriminative models like logistic regression, when the Naïve Bayes
conditional independence assumption holds.
 With Naïve Bayes Classifier algorithm, it is easier to predict class of the test data set.
A good bet for multi class predictions as well.
 Though it requires conditional independence assumption, Naïve Bayes Classifier has
presented good performance in various application domains.
3. ORDINARY LEAST SQUARES REGRESSION
If you know statistics, you probably have heard of linear regression before. Least squares are
a method for performing linear regression. You can think of linear regression as the task of
fitting a straight line through a set of points.
There are multiple possible strategies to do this, and “ordinary least squares” strategy go like
this — You can draw a line, and then for each of the data points, measure the vertical
distance between the point and the line, and add these up; the fitted line would be the one
where this sum of distances is as small as possible.
MACHINE LEARNING
16
Fig 5. 3: Ordinary Least Squares Regression
Linear refers the kind of model you are using to fit the data, while least square refers to the
kind of error metric you are minimizing over.
4. LOGISTIC REGRESSION
Logistic regression is a powerful statistical way of modeling a binomial outcome with one or
more explanatory variables. It measures the relationship between the categorical dependent
variable and one or more independent variables by estimating probabilities using a logistic
function, which is the cumulative logistic distribution.
In general, regressions can be used in real-world applications such as:
 Credit Scoring
 Measuring the success rates of marketing campaigns
 Predicting the revenues of a certain product
 Is there going to be an earthquake on a particular day?
Logistic Regression
The name of this algorithm could be a little confusing in the sense that Logistic Regression
machine learning algorithm is for classification tasks and not regression problems. The name
‘Regression’ here implies that a linear model is fit into the feature space. This algorithm
MACHINE LEARNING
17
applies a logistic function to a linear combination of features to predict the outcome of a
categorical dependent variable based on predictor variables.
The odds or probabilities that describe the outcome of a single trial are modelled as a function
of explanatory variables. Logistic regression algorithms helps estimate the probability of
falling into a specific level of the categorical dependent variable based on the given predictor
variables.
Just suppose that you want to predict if there will be a snowfall tomorrow in New York. Here
the outcome of the prediction is not a continuous number because there will either be
snowfall or no snowfall and hence linear regression cannot be applied. Here the outcome
variable is one of the several categories and using logistic regression helps.
Based on the nature of categorical response, logistic regression is classified into 3 types –
 Binary Logistic Regression – The most commonly used logistic regression when the
categorical response has 2 possible outcomes i.e. either yes or not. Example –
Predicting whether a student will pass or fail an exam, predicting whether a student
will have low or high blood pressure, predicting whether a tumour is cancerous or not.
 Multi-nominal Logistic Regression - Categorical response has 3 or more possible
outcomes with no ordering. Example- Predicting what kind of search engine (Yahoo,
Bing, Google, and MSN) is used by majority of US citizens.
 Ordinal Logistic Regression - Categorical response has 3 or more possible outcomes
with natural ordering. Example- How a customer rates the service and quality of food
at a restaurant based on a scale of 1 to 10.
Let us consider a simple example where a cake manufacturer wants to find out if baking a
cake at 160°C, 180°C and 200°C will produce a ‘hard’ or ‘soft’ variety of cake ( assuming
the fact that the bakery sells both the varieties of cake with different names and prices).
Logistic regression is a perfect fit in this scenario instead of other statistical techniques. For
example, if the manufactures produces 2 cake batches wherein the first batch contains 20
cakes (of which 7 were hard and 13 were soft ) and the second batch of cake produced
consisted of 80 cakes (of which 41 were hard and 39 were soft cakes). Here in this case if
linear regression algorithm is used it will give equal importance both the batches of cakes
regardless of the number of cakes in each batch. Applying a logistic regression algorithm will
consider this factor and give the second batch of cakes more weightage than the first batch.
MACHINE LEARNING
18
When to Use Logistic Regression Machine Learning Algorithm
 Use logistic regression algorithms when there is a requirement to model the
probabilities of the response variable as a function of some other explanatory variable.
For example, probability of buying a product X as a function of gender
 Use logistic regression algorithms when there is a need to predict probabilities that
categorical dependent variable will fall into two categories of the binary response as a
function of some explanatory variables. For example, what is the probability that a
customer will buy a perfume given that the customer is a female?
 A logistic regression algorithm is also best suited when the need is to classify
elements two categories based on the explanatory variable. For example-classify
females into ‘young’ or ‘old’ group based on their age.
Advantages of Using Logistic Regression
 Easier to inspect and less complex.
 Robust algorithm as the independent variables need not have equal variance or normal
distribution.
 These algorithms do not assume a linear relationship between the dependent and
independent variables and hence can also handle non-linear effects.
 Controls confounding and tests interaction.
5. Linear Regression Machine Learning Algorithm
Linear Regression algorithm shows the relationship between 2 variables and how the change
in one variable impacts the other. The algorithm shows the impact on the dependent variable
on changing the independent variable. The independent variables are referred as explanatory
variables, as they explain the factors the impact the dependent variable. Dependent variable is
often referred to as the factor of interest or predictor.
Advantages of Linear Regression Machine Learning Algorithm
 It is one of the most interpretable machine learning algorithms, making it easy to
explain to others.
 It is easy of use as it requires minimal tuning.
 It is the mostly widely used machine learning technique that runs fast.
MACHINE LEARNING
19
Applications of Linear Regression
 Estimating Sales: Linear Regression finds great use in business, for sales forecasting
based on the trends. If a company observes steady increase in sales every month - a
linear regression analysis of the monthly sales data helps the company forecast sales
in upcoming months.
 Risk Assessment: Linear Regression helps assess risk involved in insurance or
financial domain. A health insurance company can do a linear regression analysis on
the number of claims per customer against age. This analysis helps insurance
companies find, that older customers tend to make more insurance claims. Such
analysis results play a vital role in important business decisions and are made to
account for risk.
6. SUPPORT VECTOR MACHINES
SVM is binary classification algorithm. Given a set of points of 2 types in N dimensional
place, SVM generates a (N — 1) dimensional hyperplane to separate those points into 2
groups. Say you have some points of 2 types in a paper which are linearly separable. SVM
will find a straight line which separates those points into 2 types and situated as far as possible
from all those points.
Fig 5.6: Support Vector Machine
MACHINE LEARNING
20
In terms of scale, some of the biggest problems that have been solved using SVMs (with
suitably modified implementations) are display advertising, human splice site recognition,
image-based gender detection, large-scale image classification
Support Vector Machine is a supervised machine learning algorithm for classification or
regression problems where the dataset teaches SVM about the classes so that SVM can
classify any new data. It works by classifying the data into different classes by finding a line
(hyperplane) which separates the training data set into classes. As there are many such linear
hyperplanes, SVM algorithm tries to maximize the distance between the various classes that
are involved and this is referred as margin maximization. If the line that maximizes the
distance between the classes is identified, the probability to generalize well to unseen data is
increased.
Types of SVM:
SVM’s are classified into two categories:
 Linear SVM’s – In linear SVM’s the training data i.e. classifiers are separated by a
hyperplane.
 Non-Linear SVM’s- In non-linear SVM’s it is not possible to separate the training
data using a hyperplane. For example, the training data for Face detection consists of
group of images that are faces and another group of images that are not faces (in other
words all other images in the world except faces). Under such conditions, the training
data is too complex that it is impossible to find a representation for every feature
vector. Separating the set of faces linearly from the set of non-face is a complex task.
Advantages of Using SVM
 SVM offers best classification performance (accuracy) on the training data.
 SVM renders more efficiency for correct classification of the future data.
 The best thing about SVM is that it does not make any strong assumptions on data.
 It does not over-fit the data.
Applications of Support Vector Machine
SVM is commonly used for stock market forecasting by various financial institutions. For
instance, it can be used to compare the relative performance of the stocks when compared to
MACHINE LEARNING
21
performance of other stocks in the same sector. The relative comparison of stocks helps
manage investment making decisions based on the classifications made by the SVM learning
algorithm.
7. ENSEMBLE METHODS
Ensemble methods are learning algorithms that construct a set of classifiers and then classify
new data points by taking a weighted vote of their predictions. The original ensemble method
is Bayesian averaging, but more recent algorithms include error-correcting output coding,
bagging, and boosting.
Ensemble Learning Algorithms
So how do ensemble methods work and why are they superior to individual models?
 They average out biases: If you average a bunch of democratic-leaning polls and
republican-leaning polls together, you will get an average something that isn’t leaning
either way.
 They reduce the variance: The aggregate opinion of a bunch of models is less noisy
than the single opinion of one of the models. In finance, this is called diversification
— a mixed portfolio of many stocks will be much less variable than just one of the
stocks alone. This is why your models will be better with more data points rather than
fewer.
 They are unlikely to over-fit: If you have individual models that didn’t over-fit, and
you are combining the predictions from each model in a simple way (average,
weighted average, logistic regression), then there’s no room for over-fitting.
MACHINE LEARNING
22
7. UNSUPERVISED LEARNING
There are no labels associated with data points. These machine learning algorithms organize
the data into a group of clusters to describe its structure and make complex data look simple
and organized for analysis.
Unsupervised learning algorithm consists of
 Clustering Algorithm
 K Means Clustering Algorithm
 Apriori Algorithm
 Principal Component Analysis
 Singular Value Decomposition
1. CLUSTERING ALGORITHMS
Clustering is the task of grouping a set of objects such that objects in the same group (cluster)
are more similar to each other than to those in other groups.
Fig 6.1: Clustering Algorithms
MACHINE LEARNING
23
2. K MEANS CLUSTERING ALGORITHM
K-means is a popularly used unsupervised machine learning algorithm for cluster analysis. K-
Means is a non-deterministic and iterative method. The algorithm operates on a given data set
through pre-defined number of clusters, k. The output of K Means algorithm is k clusters
with input data partitioned among the clusters.
For instance, let’s consider K-Means Clustering for Wikipedia Search results. The search
term “Jaguar” on Wikipedia will return all pages containing the word Jaguar which can refer
to Jaguar as a Car, Jaguar as Mac OS version and Jaguar as an Animal. K Means clustering
algorithm can be applied to group the WebPages that talk about similar concepts. So, the
algorithm will group all web pages that talk about Jaguar as an Animal into one cluster,
Jaguar as a Car into another cluster and so on.
Advantages of using K-Means Clustering Machine Learning Algorithm
 In case of globular clusters, K-Means produces tighter clusters than hierarchical
clustering.
 Given a smaller value of K, K-Means clustering computes faster than hierarchical
clustering for large number of variables.
Applications of K-Means Clustering
K Means Clustering algorithm is used by most of the search engines like Yahoo, Google to
cluster web pages by similarity and identify the ‘relevance rate’ of search results. This helps
search engines reduce the computational time for the users.
3. APRIORI MACHINE LEARNING ALGORITHM
Apriori algorithm is an unsupervised machine learning algorithm that generates association
rules from a given data set. Association rule implies that if an item A occurs, then item B also
occurs with a certain probability. Most of the association rules generated are in the IF_THEN
format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it.
For the algorithm to derive such conclusions, it first observes the number of people who
bought an iPad case while purchasing an iPad. This way a ratio is derived like out of the 100
people who purchased an iPad, 85 people also purchased an iPad case.
MACHINE LEARNING
24
Basic principle on which Apriori Machine Learning Algorithm works:
 If an item set occurs frequently then all the subsets of the item set, also occur
frequently.
 If an item set occurs infrequently then all the supersets of the item set have infrequent
occurrence.
Advantages of Apriori Algorithm
 It is easy to implement and can be parallelized easily.
 Apriori implementation makes use of large item set properties.
Applications of Apriori Algorithm
 Detecting Adverse Drug Reactions: Apriori algorithm is used for association
analysis on healthcare data like-the drugs taken by patients, characteristics of each
patient, adverse ill-effects patients experience, initial diagnosis, etc. This analysis
produces association rules that help identify the combination of patient characteristics
and medications that lead to adverse side effects of the drugs.
 Market Basket Analysis: Many e-commerce giants like Amazon use Apriori to draw
data insights on which products are likely to be purchased together and which are
most responsive to promotion. For example, a retailer might use Apriori to predict
that people who buy sugar and flour are likely to buy eggs to bake a cake.
 Auto-Complete Applications: Google auto-complete is another popular application
of Apriori wherein - when the user types a word, the search engine looks for other
associated words that people usually type after a specific word.
4. PRINCIPAL COMPONENT ANALYSIS
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of
observations of possibly correlated variables into a set of values of linearly uncorrelated
variables called principal components.
Let’s understand it using an example:
Let’s say we have a data set of dimension 300 (n) × 50 (p). n represents the number of
observations and p represents number of predictors. Since we have a large p = 50, there can
MACHINE LEARNING
25
be p(p-1)/2 scatter plots i.e. more than 1000 plots possible to analyze the variable
relationship. Wouldn’t is be a tedious job to perform exploratory analysis on this data?
In this case, it would be a lucid approach to select a subset of p (p << 50) predictor which
captures as much information. Followed by plotting the observation in the resultant low
dimensional space.
The image below shows the transformation of a high dimensional data (3 dimension) to low
dimensional data (2 dimension) using PCA. Not to forget, each resultant dimension is a linear
combination of p features.
Fig 5.4: Principal Component Analysis
Some of the applications of PCA include compression, simplifying data for easier learning,
visualization.
Notice that domain knowledge is very important while choosing whether to go forward with
PCA or not. It is not suitable in cases where data is noisy (all the components of PCA have
quite a high variance).
MACHINE LEARNING
26
5. SINGULAR VALUE DECOMPOSITION
In linear algebra, SVD is a factorization of a real complex matrix. For a given m * n matrix
M, there exists a decomposition such that M = UΣV, where U and V are unitary matrices and
Σ is a diagonal matrix.
PCA is actually a simple application of SVD. In computer vision, the 1st face recognition
algorithms used
PCA and SVD in order to represent faces as a linear combination of “eigenfaces”, do
dimensionality reduction, and then match faces to identities via simple methods; although
modern methods are much more sophisticated, many still depend on similar techniques.
MACHINE LEARNING
27
8. CONCLUSION
These days, machine learning techniques are being widely used to solve real-world problems
by storing, manipulating, extracting and retrieving data from large sources. Supervised
machine learning techniques have been widely adopted however these techniques prove to be
very expensive when the systems are implemented over wide range of data. This is due to the
fact that significant amount of effort and cost is involved because of obtaining large labeled
data sets. Thus active learning provides a way to reduce the labeling costs by labeling only
the most useful instances for learning.
MACHINE LEARNING
28
9. REFERENCES
 https://en.wikipedia.org/wiki/Machine_learning
 https://en.wikipedia.org/wiki/Supervised_learning
 https://en.wikipedia.org/wiki/Unsupervised_learning
 https://en.wikipedia.org/wiki/Reinforcement_learning

More Related Content

What's hot

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1Introduction Artificial Intelligence a modern approach by Russel and Norvig 1
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1Garry D. Lasaga
 
Artificial intelligence and knowledge representation
Artificial intelligence and knowledge representationArtificial intelligence and knowledge representation
Artificial intelligence and knowledge representationSajan Sahu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm pptMayank Jain
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt Poojamanic
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 

What's hot (20)

search strategies in artificial intelligence
search strategies in artificial intelligencesearch strategies in artificial intelligence
search strategies in artificial intelligence
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1Introduction Artificial Intelligence a modern approach by Russel and Norvig 1
Introduction Artificial Intelligence a modern approach by Russel and Norvig 1
 
Artificial intelligence and knowledge representation
Artificial intelligence and knowledge representationArtificial intelligence and knowledge representation
Artificial intelligence and knowledge representation
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Data science
Data science Data science
Data science
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Hill climbing
Hill climbingHill climbing
Hill climbing
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm ppt
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Unit 1 chapter 1 Design and Analysis of Algorithms
Unit 1   chapter 1 Design and Analysis of AlgorithmsUnit 1   chapter 1 Design and Analysis of Algorithms
Unit 1 chapter 1 Design and Analysis of Algorithms
 

Similar to Machine Learning Algorithms Explained

Data science dec ppt
Data science dec pptData science dec ppt
Data science dec pptsterlingit
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationTara ram Goyal
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionathirakurup3
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
 
source1
source1source1
source1butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
 
Machine learning
Machine learningMachine learning
Machine learningPawanCT
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners Antonio Fernandes
 
Machine learning
Machine learningMachine learning
Machine learningeonx_32
 
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)RR IT Zone
 
Machine learning: how to create an Artificial Intelligence in one infographic...
Machine learning: how to create an Artificial Intelligence in one infographic...Machine learning: how to create an Artificial Intelligence in one infographic...
Machine learning: how to create an Artificial Intelligence in one infographic...EnjoyDigitAll by BNP Paribas
 
White-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfWhite-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfBoris647814
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.pptARVIND SARDAR
 
Ai artificial intelligence professional vocabulary collection
Ai artificial intelligence professional vocabulary collectionAi artificial intelligence professional vocabulary collection
Ai artificial intelligence professional vocabulary collectionRuchi Jain
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...eswaralaldevadoss
 
Machine learning
Machine learningMachine learning
Machine learningAbrar ali
 

Similar to Machine Learning Algorithms Explained (20)

Data science dec ppt
Data science dec pptData science dec ppt
Data science dec ppt
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
source1
source1source1
source1
 
Basics of Soft Computing
Basics of Soft  Computing Basics of Soft  Computing
Basics of Soft Computing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
 
Machine learning
Machine learningMachine learning
Machine learning
 
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)
 
Machine learning: how to create an Artificial Intelligence in one infographic...
Machine learning: how to create an Artificial Intelligence in one infographic...Machine learning: how to create an Artificial Intelligence in one infographic...
Machine learning: how to create an Artificial Intelligence in one infographic...
 
Machine learning
Machine learningMachine learning
Machine learning
 
White-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfWhite-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdf
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Ai artificial intelligence professional vocabulary collection
Ai artificial intelligence professional vocabulary collectionAi artificial intelligence professional vocabulary collection
Ai artificial intelligence professional vocabulary collection
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
 
Machine learning
Machine learningMachine learning
Machine learning
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Machine Learning Algorithms Explained

  • 1. MACHINE LEARNING 1 1. ABSTRACT Machine Learning is a scientific discipline that addresses the following question: ‘How can we program systems to automatically learn and to improve with experience? ’Learning in this context is not learning by heart but recognizing complex patterns and make intelligent decisions based on data. The difficulty lies in the fact that the set of all possible decisions given all possible inputs is too complex to describe. To tackle this problem the field of Machine Learning develops algorithms that discover knowledge from specific data and experience, based on sound statistical and computational principles. The field of Machine Learning integrates many distinct approaches such as probability theory, logic, combinatorial optimization, search, statistics, reinforcement learning and control theory. The developed methods are at the basis of many applications, ranging from vision to language processing, forecasting, pattern recognition, games, data mining, expert systems and robotics.
  • 2. MACHINE LEARNING 2 2. INTRODUCTION Machine learning is a method of data analysis that automates analytical model building. Machine learning is a type of artificial intelligence (AI) that enables software applications to become more accurate in forecasting outcomes without being specially programmed. The main idea of machine learning is to create algorithms that can receive input data and use statistical analysis to predict an output value within an acceptable range. The processes involved in machine learning are like data mining and predictive modeling. They require searching through data to look for patterns and adjusting program actions appropriately. Basically we have,  Deep Learning  Representation Learning  Machine Learning  Artificial Intelligence DEEP LEARNING Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has networks which are capable of learning unsupervised from data that is unstructured or unlabeled. Also known as Deep Neural Learning or Deep Neural Network. Ex: MLP’s (Multilayer Perceptron) REPRESENTATION OR FEATURE LEARNING In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. Ex: Shallow encoders
  • 3. MACHINE LEARNING 3 MACHINE LEARNING Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. Ex: Logistic regression ARTIFICIAL INTELLIGENCE Artificial Intelligence is a field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans. Ex: Knowledge bases
  • 4. MACHINE LEARNING 4 3. MACHINE LEARNING PROCESS The machine learning process can be done in following steps  Identifies relevant data sets and prepares them for analysis  Chooses the type of machine learning algorithm to use  Builds an analytical model based on chosen algorithm  Trains the model on test data sets, revising it as needed  Runs the model o generate scores and other findings Fig 2.1: Machine learning process
  • 5. MACHINE LEARNING 5 4. OVERVIEW Machine Learning is the only kind of AI there is. AI is changing. We are now recognizing that most things called "AI" in the past are nothing more than advanced programming tricks. As long as the programmer is the one supplying all the intelligence to the system by programming it in as a World Model, the system is not really an Artificial Intelligence. It's "just a program". Don't model the World; Model the Mind. When you Model the Mind you can create systems capable of Learning everything about the world. It is a much smaller task, since the world is very large and changes behind your back, which means World Models, will become obsolete the moment they are made. The only hope to create intelligent systems is to have the system itself create and maintain its own World Models. Continuously, in response to sensory input. Machine learning is a subset of AI. That is, all machine learning counts as AI, but not all AI counts as machine learning. For example, symbolic logic (rules engines, expert systems and knowledge graphs) as well as evolutionary algorithms and Bayesian statistics could all be described as AI, and none of them are machine learning. The "learning" part of machine learning means that ML algorithms attempt to optimize along a certain dimension; i.e. they usually try to minimize error or maximize the likelihood of their predictions being true. How does one minimize error? Well, one way is to build a framework that multiplies inputs in order to make guesses as to the inputs' nature. Different outputs/guesses are the product of the inputs and the algorithm. Usually, the initial guesses are quite wrong, and if you are lucky enough to have ground-truth labels pertaining to the input, you can measure how wrong your guesses are by contrasting them with the truth, and then use that error to modify your algorithm. That's what neural networks do. They keep on measuring the error and modifying their parameters until they can't achieve any less error. They are, in short, an optimization algorithm. If you tune them right, they minimize their error by guessing and guessing and guessing again. Neural networks are part of machine learning.
  • 6. MACHINE LEARNING 6 Introduction to Deep Neural Networks: Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text or time series, must be translated. According to me, Artificial Intelligence (AI) and Machine Learning (ML) go hand-in-hand with big data. To make really intelligent machines, you need huge amounts of data for them to learn from. Similarly, to understand huge amounts of data, you need the help of intelligent machines. AI VS ML Machine Learning is a subfield of computer science that focuses on enabling computers to make accurate predictions on any type of data. So instead of explicitly telling a computer how to solve a problem, you show it how it was previously solved and the computer identifies/learns on its own on all the steps that were part of the solution. Artificial intelligence, on the other hand, is a much broader concept that stems from the idea that human intelligence "can be so precisely described that a machine can be made to simulate it”. It means that instead of just learning from a set of data, computers will treat that dataset as knowledge, use it for planning, communicate that plan with humans or other AI, and move/manipulate real world objects to execute that plan, all on their own.
  • 7. MACHINE LEARNING 7 5. TYPES OF MACHINE LEARNING ALGORITHMS Machine learning algorithms can be divided into 3 broad categories  Supervised learning  Unsupervised learning  Reinforcement learning SUPERVISED LEARNING  In supervised learning, an input vector is applied to the network and it results in an output vector. This result is compared with the target response. It generates a function that maps input to desired outputs.  Supervised learning is useful in cases where a property (label) is available for a certain dataset (training set), but is missing and needs to be predicted for other instances. UNSUPERVISED LEARNING  In unsupervised learning, the input vectors of similar types are grouped without the use of learning data that specify how a typical member of each group looks or to which group member belongs.  Unsupervised learning is useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset (items are not pre-assigned). REINFORCEMENT LEARNING  Reinforcement learning falls between these 2 extremes — there is some form of feedback available for each predictive step or action, but no precise label or error message.  Reinforcement Learning allows the machine or software agent to learn its behaviour based on feedback from the environment.  These algorithms choose an action, based on each data point and later learn how good the decision was. Over time, the algorithm changes its strategy to learn better and achieve the best reward.
  • 8. MACHINE LEARNING 8 6. SUPERVISED LEARNING Machine learning algorithms that make predictions on given set of samples. Supervised machine learning algorithm searches for patterns within the value labels assigned to data points. Supervised learning algorithm consists of  Decision tree  Naive Bayes Classification  Ordinary Least Squares Regression  Logistic Regression  Linear Regression  Support Vector Machines  Ensemble Methods 1. DECISION TREES A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility. A decision tree is a graphical representation that makes use of branching methodology to exemplify all possible outcomes of a decision, based on certain conditions. In a decision tree, the internal node represents a test on the attribute, each branch of the tree represents the outcome of the test and the leaf node represents a particular class label i.e. the decision made after computing all of the attributes. The classification rules are represented through the path from root to the leaf node. Types of Decision Trees Classification Trees- These are considered as the default kind of decision trees used to separate a dataset into different classes, based on the response variable. These are generally used when the response variable is categorical in nature. Regression Trees-When the response or target variable is continuous or numerical, regression trees are used. These are generally used in predictive type of problems when compared to classification.
  • 9. MACHINE LEARNING 9 Why should you use Decision Tree Machine Learning algorithm? These machine learning algorithms help make decisions under uncertainty and help you improve communication, as they present a visual representation of a decision situation.  Decision tree machine learning algorithms help a data scientist capture the idea that if a different decision was taken, then how the operational nature of a situation or model would have changed intensely.  Decision tree algorithms help make optimal decisions by allowing a data scientist to traverse through forward and backward calculation paths. When to use Decision Tree Machine Learning Algorithm  Decision trees are robust to errors and if the training data contains errors- decision tree algorithms will be best suited to address such problems.  Decision trees are best suited for problems where instances are represented by attribute value pairs.  If the training data has missing value then decision trees can be used, as they can handle missing values nicely by looking at the data in other columns.  Decision trees are best suited when the target function has discrete output values. Advantages of Using Decision Tree Machine Learning Algorithms  Decision trees are very instinctual and can be explained to anyone with ease. People from a non-technical background, can also decipher the hypothesis drawn from a decision tree, as they are self-explanatory.  When using decision tree machine learning algorithms, data type is not a constraint as they can handle both categorical and numerical variables.  Decision tree machine learning algorithms do not require making any assumption on the linearity in the data and hence can be used in circumstances where the parameters are non-linearly related. These machine learning algorithms do not make any assumptions on the classifier structure and space distribution.  These algorithms are useful in data exploration. Decision trees implicitly perform feature selection which is very important in predictive analytics. When a decision tree is fit to a training dataset, the nodes at the top on which the decision tree is split, are
  • 10. MACHINE LEARNING 10 considered as important variables within a given dataset and feature selection is completed by default.  Decision trees help save data preparation time, as they are not sensitive to missing values and outliers. Missing values will not stop you from splitting the data for building a decision tree. Outliers will also not affect the decision trees as data splitting happens based on some samples within the split range and not on exact absolute values. Drawbacks of Using Decision Tree Machine Learning Algorithms  The more the number of decisions in a tree, less is the accuracy of any expected outcome.  A major drawback of decision tree machine learning algorithms, is that the outcomes may be based on expectations. When decisions are made in real-time, the payoffs and resulting outcomes might not be the same as expected or planned. There are chances that this could lead to unrealistic decision trees leading to bad decision making. Any irrational expectations could lead to major errors and flaws in decision tree analysis, as it is not always possible to plan for all eventualities that can arise from a decision.  Decision Trees do not fit well for continuous variables and result in instability and classification plateaus.  Decision trees are easy to use when compared to other decision making models but creating large decision trees that contain several branches is a complex and time consuming task.  Decision tree machine learning algorithms consider only one attribute at a time and might not be best suited for actual data in the decision space.  Large sized decision trees with multiple branches are not comprehensible and pose several presentation difficulties. Applications of Decision Tree Machine Learning Algorithm  Decision trees are among the popular machine learning algorithms that find great use in finance for option pricing.  Remote sensing is an application area for pattern recognition based on decision trees.  Decision tree algorithms are used by banks to classify loan applicants by their probability of defaulting payments.
  • 11. MACHINE LEARNING 11  Gerber Products, a popular baby product company, used decision tree machine learning algorithm to decide whether they should continue using the plastic PVC (Poly Vinyl Chloride) in their products.  Rush University Medical Centre has developed a tool named Guardian that uses a decision tree machine learning algorithm to identify at-risk patients and disease trends. Take a look at the image to get a sense of how it looks like.
  • 12. MACHINE LEARNING 12 Fig 5.1: Decision tree Decision Tree Example: From a business decision point of view, a decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision, most of the time. As a method, it allows you to approach the problem in a structured and systematic way to arrive at a logical conclusion. 2. NAIVE BAYES CLASSIFICATION Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes theorem with strong (naive) independence assumptions between the features. Fig 5. 2: Naive Bayes Classification
  • 13. MACHINE LEARNING 13 The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is likelihood, P(A) is class prior probability, and P(B) is predictor prior probability. Example of Stack Overflow thread:  We have a training dataset of 1,000 fruits.  The fruit can be a Banana, Orange or Other (these are the classes).  The fruit can be Long, Sweet or Yellow (these are the features). What do you see in this training dataset?  Out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow.  Out of 300 oranges, none are long, 150 are sweet and 300 are yellow.  Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow. If we are given the length, sweetness and color of a fruit (without knowing its class), we can now calculate the probability of it being a banana, orange or other fruit. Suppose we are told the unknown fruit is long, sweet and yellow. Here’s how we calculate all the probabilities in 4 steps: Step 1: To calculate the probability the fruit is a banana, let’s first recognize that this looks familiar. It’s the probability of the class Banana given the features Long, Sweet and Yellow or more succinctly:  This is exactly like the equation discussed earlier. Step 2: Starting with the numerator, let’s plug everything in. 
  • 14. MACHINE LEARNING 14    Multiplying everything together (as in the equation), we get: Step 3: Ignore the denominator, since it’ll be the same for all the other calculations. Step 4: Do a similar calculation for the other classes:   Since it is greater than, Naive Bayes would classify this long, sweet and yellow fruit as a banana. When to use the Machine Learning algorithm - Naïve Bayes Classifier?  If you have a moderate or large training data set.  If the instances have several attributes.  Given the classification parameter, attributes which describe the instances should be conditionally independent. Applications of Naïve Bayes Classifier  Sentiment Analysis- It is used at Facebook to analyse status updates expressing positive or negative emotions.  Document Categorization- Google uses document classification to index documents and find relevancy scores i.e. the PageRank. PageRank mechanism considers the pages marked as important in the databases that were parsed and classified using a document classification technique.  Naïve Bayes Algorithm is also used for classifying news articles about Technology, Entertainment, Sports, Politics, etc.
  • 15. MACHINE LEARNING 15  Email Spam Filtering-Google Mail uses Naïve Bayes algorithm to classify your emails as Spam or Not Spam Some of real world examples are:  To mark an email as spam or not spam  Classify a news article about technology, politics, or sports  Check a piece of text expressing positive emotions, or negative emotions?  Used for face recognition software. Advantages of the Naïve Bayes Classifier Machine Learning Algorithm  Naïve Bayes Classifier algorithm performs well when the input variables are categorical.  A Naïve Bayes classifier converges faster, requiring relatively little training data than other discriminative models like logistic regression, when the Naïve Bayes conditional independence assumption holds.  With Naïve Bayes Classifier algorithm, it is easier to predict class of the test data set. A good bet for multi class predictions as well.  Though it requires conditional independence assumption, Naïve Bayes Classifier has presented good performance in various application domains. 3. ORDINARY LEAST SQUARES REGRESSION If you know statistics, you probably have heard of linear regression before. Least squares are a method for performing linear regression. You can think of linear regression as the task of fitting a straight line through a set of points. There are multiple possible strategies to do this, and “ordinary least squares” strategy go like this — You can draw a line, and then for each of the data points, measure the vertical distance between the point and the line, and add these up; the fitted line would be the one where this sum of distances is as small as possible.
  • 16. MACHINE LEARNING 16 Fig 5. 3: Ordinary Least Squares Regression Linear refers the kind of model you are using to fit the data, while least square refers to the kind of error metric you are minimizing over. 4. LOGISTIC REGRESSION Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. In general, regressions can be used in real-world applications such as:  Credit Scoring  Measuring the success rates of marketing campaigns  Predicting the revenues of a certain product  Is there going to be an earthquake on a particular day? Logistic Regression The name of this algorithm could be a little confusing in the sense that Logistic Regression machine learning algorithm is for classification tasks and not regression problems. The name ‘Regression’ here implies that a linear model is fit into the feature space. This algorithm
  • 17. MACHINE LEARNING 17 applies a logistic function to a linear combination of features to predict the outcome of a categorical dependent variable based on predictor variables. The odds or probabilities that describe the outcome of a single trial are modelled as a function of explanatory variables. Logistic regression algorithms helps estimate the probability of falling into a specific level of the categorical dependent variable based on the given predictor variables. Just suppose that you want to predict if there will be a snowfall tomorrow in New York. Here the outcome of the prediction is not a continuous number because there will either be snowfall or no snowfall and hence linear regression cannot be applied. Here the outcome variable is one of the several categories and using logistic regression helps. Based on the nature of categorical response, logistic regression is classified into 3 types –  Binary Logistic Regression – The most commonly used logistic regression when the categorical response has 2 possible outcomes i.e. either yes or not. Example – Predicting whether a student will pass or fail an exam, predicting whether a student will have low or high blood pressure, predicting whether a tumour is cancerous or not.  Multi-nominal Logistic Regression - Categorical response has 3 or more possible outcomes with no ordering. Example- Predicting what kind of search engine (Yahoo, Bing, Google, and MSN) is used by majority of US citizens.  Ordinal Logistic Regression - Categorical response has 3 or more possible outcomes with natural ordering. Example- How a customer rates the service and quality of food at a restaurant based on a scale of 1 to 10. Let us consider a simple example where a cake manufacturer wants to find out if baking a cake at 160°C, 180°C and 200°C will produce a ‘hard’ or ‘soft’ variety of cake ( assuming the fact that the bakery sells both the varieties of cake with different names and prices). Logistic regression is a perfect fit in this scenario instead of other statistical techniques. For example, if the manufactures produces 2 cake batches wherein the first batch contains 20 cakes (of which 7 were hard and 13 were soft ) and the second batch of cake produced consisted of 80 cakes (of which 41 were hard and 39 were soft cakes). Here in this case if linear regression algorithm is used it will give equal importance both the batches of cakes regardless of the number of cakes in each batch. Applying a logistic regression algorithm will consider this factor and give the second batch of cakes more weightage than the first batch.
  • 18. MACHINE LEARNING 18 When to Use Logistic Regression Machine Learning Algorithm  Use logistic regression algorithms when there is a requirement to model the probabilities of the response variable as a function of some other explanatory variable. For example, probability of buying a product X as a function of gender  Use logistic regression algorithms when there is a need to predict probabilities that categorical dependent variable will fall into two categories of the binary response as a function of some explanatory variables. For example, what is the probability that a customer will buy a perfume given that the customer is a female?  A logistic regression algorithm is also best suited when the need is to classify elements two categories based on the explanatory variable. For example-classify females into ‘young’ or ‘old’ group based on their age. Advantages of Using Logistic Regression  Easier to inspect and less complex.  Robust algorithm as the independent variables need not have equal variance or normal distribution.  These algorithms do not assume a linear relationship between the dependent and independent variables and hence can also handle non-linear effects.  Controls confounding and tests interaction. 5. Linear Regression Machine Learning Algorithm Linear Regression algorithm shows the relationship between 2 variables and how the change in one variable impacts the other. The algorithm shows the impact on the dependent variable on changing the independent variable. The independent variables are referred as explanatory variables, as they explain the factors the impact the dependent variable. Dependent variable is often referred to as the factor of interest or predictor. Advantages of Linear Regression Machine Learning Algorithm  It is one of the most interpretable machine learning algorithms, making it easy to explain to others.  It is easy of use as it requires minimal tuning.  It is the mostly widely used machine learning technique that runs fast.
  • 19. MACHINE LEARNING 19 Applications of Linear Regression  Estimating Sales: Linear Regression finds great use in business, for sales forecasting based on the trends. If a company observes steady increase in sales every month - a linear regression analysis of the monthly sales data helps the company forecast sales in upcoming months.  Risk Assessment: Linear Regression helps assess risk involved in insurance or financial domain. A health insurance company can do a linear regression analysis on the number of claims per customer against age. This analysis helps insurance companies find, that older customers tend to make more insurance claims. Such analysis results play a vital role in important business decisions and are made to account for risk. 6. SUPPORT VECTOR MACHINES SVM is binary classification algorithm. Given a set of points of 2 types in N dimensional place, SVM generates a (N — 1) dimensional hyperplane to separate those points into 2 groups. Say you have some points of 2 types in a paper which are linearly separable. SVM will find a straight line which separates those points into 2 types and situated as far as possible from all those points. Fig 5.6: Support Vector Machine
  • 20. MACHINE LEARNING 20 In terms of scale, some of the biggest problems that have been solved using SVMs (with suitably modified implementations) are display advertising, human splice site recognition, image-based gender detection, large-scale image classification Support Vector Machine is a supervised machine learning algorithm for classification or regression problems where the dataset teaches SVM about the classes so that SVM can classify any new data. It works by classifying the data into different classes by finding a line (hyperplane) which separates the training data set into classes. As there are many such linear hyperplanes, SVM algorithm tries to maximize the distance between the various classes that are involved and this is referred as margin maximization. If the line that maximizes the distance between the classes is identified, the probability to generalize well to unseen data is increased. Types of SVM: SVM’s are classified into two categories:  Linear SVM’s – In linear SVM’s the training data i.e. classifiers are separated by a hyperplane.  Non-Linear SVM’s- In non-linear SVM’s it is not possible to separate the training data using a hyperplane. For example, the training data for Face detection consists of group of images that are faces and another group of images that are not faces (in other words all other images in the world except faces). Under such conditions, the training data is too complex that it is impossible to find a representation for every feature vector. Separating the set of faces linearly from the set of non-face is a complex task. Advantages of Using SVM  SVM offers best classification performance (accuracy) on the training data.  SVM renders more efficiency for correct classification of the future data.  The best thing about SVM is that it does not make any strong assumptions on data.  It does not over-fit the data. Applications of Support Vector Machine SVM is commonly used for stock market forecasting by various financial institutions. For instance, it can be used to compare the relative performance of the stocks when compared to
  • 21. MACHINE LEARNING 21 performance of other stocks in the same sector. The relative comparison of stocks helps manage investment making decisions based on the classifications made by the SVM learning algorithm. 7. ENSEMBLE METHODS Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a weighted vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, bagging, and boosting. Ensemble Learning Algorithms So how do ensemble methods work and why are they superior to individual models?  They average out biases: If you average a bunch of democratic-leaning polls and republican-leaning polls together, you will get an average something that isn’t leaning either way.  They reduce the variance: The aggregate opinion of a bunch of models is less noisy than the single opinion of one of the models. In finance, this is called diversification — a mixed portfolio of many stocks will be much less variable than just one of the stocks alone. This is why your models will be better with more data points rather than fewer.  They are unlikely to over-fit: If you have individual models that didn’t over-fit, and you are combining the predictions from each model in a simple way (average, weighted average, logistic regression), then there’s no room for over-fitting.
  • 22. MACHINE LEARNING 22 7. UNSUPERVISED LEARNING There are no labels associated with data points. These machine learning algorithms organize the data into a group of clusters to describe its structure and make complex data look simple and organized for analysis. Unsupervised learning algorithm consists of  Clustering Algorithm  K Means Clustering Algorithm  Apriori Algorithm  Principal Component Analysis  Singular Value Decomposition 1. CLUSTERING ALGORITHMS Clustering is the task of grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups. Fig 6.1: Clustering Algorithms
  • 23. MACHINE LEARNING 23 2. K MEANS CLUSTERING ALGORITHM K-means is a popularly used unsupervised machine learning algorithm for cluster analysis. K- Means is a non-deterministic and iterative method. The algorithm operates on a given data set through pre-defined number of clusters, k. The output of K Means algorithm is k clusters with input data partitioned among the clusters. For instance, let’s consider K-Means Clustering for Wikipedia Search results. The search term “Jaguar” on Wikipedia will return all pages containing the word Jaguar which can refer to Jaguar as a Car, Jaguar as Mac OS version and Jaguar as an Animal. K Means clustering algorithm can be applied to group the WebPages that talk about similar concepts. So, the algorithm will group all web pages that talk about Jaguar as an Animal into one cluster, Jaguar as a Car into another cluster and so on. Advantages of using K-Means Clustering Machine Learning Algorithm  In case of globular clusters, K-Means produces tighter clusters than hierarchical clustering.  Given a smaller value of K, K-Means clustering computes faster than hierarchical clustering for large number of variables. Applications of K-Means Clustering K Means Clustering algorithm is used by most of the search engines like Yahoo, Google to cluster web pages by similarity and identify the ‘relevance rate’ of search results. This helps search engines reduce the computational time for the users. 3. APRIORI MACHINE LEARNING ALGORITHM Apriori algorithm is an unsupervised machine learning algorithm that generates association rules from a given data set. Association rule implies that if an item A occurs, then item B also occurs with a certain probability. Most of the association rules generated are in the IF_THEN format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it. For the algorithm to derive such conclusions, it first observes the number of people who bought an iPad case while purchasing an iPad. This way a ratio is derived like out of the 100 people who purchased an iPad, 85 people also purchased an iPad case.
  • 24. MACHINE LEARNING 24 Basic principle on which Apriori Machine Learning Algorithm works:  If an item set occurs frequently then all the subsets of the item set, also occur frequently.  If an item set occurs infrequently then all the supersets of the item set have infrequent occurrence. Advantages of Apriori Algorithm  It is easy to implement and can be parallelized easily.  Apriori implementation makes use of large item set properties. Applications of Apriori Algorithm  Detecting Adverse Drug Reactions: Apriori algorithm is used for association analysis on healthcare data like-the drugs taken by patients, characteristics of each patient, adverse ill-effects patients experience, initial diagnosis, etc. This analysis produces association rules that help identify the combination of patient characteristics and medications that lead to adverse side effects of the drugs.  Market Basket Analysis: Many e-commerce giants like Amazon use Apriori to draw data insights on which products are likely to be purchased together and which are most responsive to promotion. For example, a retailer might use Apriori to predict that people who buy sugar and flour are likely to buy eggs to bake a cake.  Auto-Complete Applications: Google auto-complete is another popular application of Apriori wherein - when the user types a word, the search engine looks for other associated words that people usually type after a specific word. 4. PRINCIPAL COMPONENT ANALYSIS PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Let’s understand it using an example: Let’s say we have a data set of dimension 300 (n) × 50 (p). n represents the number of observations and p represents number of predictors. Since we have a large p = 50, there can
  • 25. MACHINE LEARNING 25 be p(p-1)/2 scatter plots i.e. more than 1000 plots possible to analyze the variable relationship. Wouldn’t is be a tedious job to perform exploratory analysis on this data? In this case, it would be a lucid approach to select a subset of p (p << 50) predictor which captures as much information. Followed by plotting the observation in the resultant low dimensional space. The image below shows the transformation of a high dimensional data (3 dimension) to low dimensional data (2 dimension) using PCA. Not to forget, each resultant dimension is a linear combination of p features. Fig 5.4: Principal Component Analysis Some of the applications of PCA include compression, simplifying data for easier learning, visualization. Notice that domain knowledge is very important while choosing whether to go forward with PCA or not. It is not suitable in cases where data is noisy (all the components of PCA have quite a high variance).
  • 26. MACHINE LEARNING 26 5. SINGULAR VALUE DECOMPOSITION In linear algebra, SVD is a factorization of a real complex matrix. For a given m * n matrix M, there exists a decomposition such that M = UΣV, where U and V are unitary matrices and Σ is a diagonal matrix. PCA is actually a simple application of SVD. In computer vision, the 1st face recognition algorithms used PCA and SVD in order to represent faces as a linear combination of “eigenfaces”, do dimensionality reduction, and then match faces to identities via simple methods; although modern methods are much more sophisticated, many still depend on similar techniques.
  • 27. MACHINE LEARNING 27 8. CONCLUSION These days, machine learning techniques are being widely used to solve real-world problems by storing, manipulating, extracting and retrieving data from large sources. Supervised machine learning techniques have been widely adopted however these techniques prove to be very expensive when the systems are implemented over wide range of data. This is due to the fact that significant amount of effort and cost is involved because of obtaining large labeled data sets. Thus active learning provides a way to reduce the labeling costs by labeling only the most useful instances for learning.
  • 28. MACHINE LEARNING 28 9. REFERENCES  https://en.wikipedia.org/wiki/Machine_learning  https://en.wikipedia.org/wiki/Supervised_learning  https://en.wikipedia.org/wiki/Unsupervised_learning  https://en.wikipedia.org/wiki/Reinforcement_learning