Machine Learning_Unit 2_Full.ppt.pdf

Dr.DHANALAKSHMI SENTHILKUMAR
Dr.DHANALAKSHMI SENTHILKUMARProfessor em MALLA REDDY ENGINEERING COLLEGE(A),TELANGANA,INDIA
MACHINE LEARNING AND DATA SCIENCES
III-B.Tech.-II-Sem
Subject Code: CS-PCC-322
Unit-II: Machine Learning 10 hours
Overview of Machine learning concepts – Over fitting and train/test
splits, Types of Machine learning – Supervised, Unsupervised,
Reinforced learning, Introduction to Bayes Theorem, Linear Regression-
model assumptions, regularization (lasso, ridge, elastic net),
Classification and Regression algorithms- Naïve Bayes, K-Nearest
Neighbors, logistic regression, support vector machines (SVM), decision
trees, and random forest, Classification Errors..
Dr.S.Dhanalakshmi
Introduction to Machine Learning (Definition)
Machine learning is a subfield of artificial intelligence (AI). The goal of machine
learning generally is to understand the structure of data and fit that data into models
that can be understood and utilized by people.
•Machine Learning is the most popular technique of predicting the future or
classifying information to help people in making necessary decisions.
•Machine Learning algorithms are trained over instances or examples through which
they learn from past experiences and also analyze the historical data.
•The whole concept of machine learning is figuring out ways in which we can teach a
computer to perform a task without a need to provide explicit instructions.
•The term machine learning was coined in 1959 by Arthur SamuelThe term machine
learning was coined in 1959 by Arthur Samuel, an American IBMerThe term machine
learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the
field of computer gamingThe term machine learning was coined in 1959 by Arthur
Samuel, an American IBMer and pioneer in the field of computer gaming and artificial
intelligence.
•Machine learning, which deals with the information world. Machines use data to
learn, and machine learning aims to derive meaning from that data. Machine learning
uses statistical methods to enable machines to improve with machines. A subset of
machine learning is deep learning, which enables multi-layer neural networks.
Contd.,
Contd.,
Overview of Machine Learning Concepts
AI is the greater pool that contains
an amalgamation of all (AI enables machines
to think without any human intervention)
Machine Learning is a part of Artificial
Intelligence that involves implementing
algorithms that are able to learn from
the data or previous instances and are able
to perform tasks without explicit
instructions. (subset of AI that uses
statistical learning algorithms that learn
pattern in data over time)
Deep learning is a component of a broader
family of machine learning
methods supported artificial neural
networks with representation learning.
(subset of ML that filters the data through
multiple layers)
Essential Components of Machine Learning
• Representation (what the model looks like)
• Evaluation (how do we differentiate good models from bad ones)
• Optimization ( what is our process for finding the good models among all the
possible models)
Contd.,
Contd.,
TRADITIONAL PROGRAMMING VS MACHINE LEARNING
Traditional Programming
Traditional programming is a manual process — meaning a person
(programmer) creates the program. But without anyone programming the
logic, one has to manually formulate or code rules. We have the input data,
and someone (programmer) coded a program that uses that data and runs
on a computer to produce the desired output.
Machine Learning
Machine Learning, on the other hand, the input data and output are fed to an algorithm
to create a program.
In Traditional programming one has to manually formulate/code rules while in Machine
Learning the algorithms automatically formulate the rules from the data, which is very
powerful. .
Contd.,
Terminology in Machine Learning
Model: Also known as “hypothesis”, a machine learning model is the mathematical
representation of a real-world process. A machine learning algorithm along with the training
data builds a machine learning model.
Feature: A feature is a measurable property or parameter of the data-set.
Feature Vector: It is a set of multiple numeric features. We use it as an input to the machine
learning model for training and prediction purposes.
Training: An algorithm takes a set of data known as “training data” as input. The learning
algorithm finds patterns in the input data and trains the model for expected results (target).
The output of the training process is the machine learning model.
Prediction: Once the machine learning model is ready, it can be fed with input data to
provide a predicted output.
Target (Label): The value that the machine learning model has to predict is called the target
or label.
Overfitting: When a massive amount of data trains a machine learning model, it tends to
learn from the noise and inaccurate data entries. Here the model fails to characterize the
data correctly.
Underfitting: It is the scenario when the model fails to decipher the underlying trend in the
input data. It destroys the accuracy of the machine learning model. In simple terms, the
model or the algorithm does not fit the data well enough
How Does Machine Learning Works
Steps to Build for ML Model
Contd.,
1. Data collection
Machine learning requires training data, a lot of it (either labelled, meaning
supervised learning or not labelled, meaning unsupervised learning).
2. Data preparation
Raw data alone is not very useful. The data needs to be prepared, normalized,
de-duplicated and errors and bias need to be removed. Visualisation of the data can be
used to look for patterns and outliers to see if the right data has been collected or if data is
missing.
Cleaning and Visualizing Data
Contd.,
3. Choosing a model
Based on the collected data with relevant to the task choose a model. Its mainly
used for various models, linear regression, logistic regression, decision trees,
K-means, principal component analysis (PCA), Support Vector Machines (SVM),
Naïve Bayes, Random Forest and Neural Networks. If your model is suited for
numerical or categorical data and choose accordingly.
Model Applications
Logistic Regression Price prediction
Fully connected networks Classification
Convolutional Neural Networks Image processing
Recurrent Neural Networks Voice recognition
Random Forest Fraud Detection
Reinforcement Learning Learning by trial and error
Generative Models Image creation
K-means Segmentation
k-Nearest Neighbors Recommendation systems
Bayesian Classifiers Spam and noise filtering
Contd.,
4. Training
Training is the most important step in machine learning. In training, you pass the
prepared data to your machine learning model to find patterns and make
predictions. It results in the model learning from the data so that it can accomplish
the task set. Over time, with training, the model gets better at predicting.
5. Evaluation Evaluating a model
After training the model comes evaluating the model. This entails testing the
machine learning against an unused control dataset to see how it performs. This
might be representative of how the model works in the real world, but this does
not have to be the case. The larger the number of variables in the real world, the
bigger to training and test data should be.
Contd.,
6. Parameter tuning
After evaluating your model, you should test the originally set parameters to
improve the AI. Increasing the number of training cycles can lead to more accurate
results. However, you should define when a model is good enough as otherwise,
you will continue to tweak the model. This is an experimental process.
7. Prediction
Once you have gone through the process of collecting data, preparing the data,
selecting the model, training and evaluating the model and tuning the parameters,
it is time to answer questions using predictions. These can be all kinds of
predictions, ranging from image recognition to semantics to predictive analytics.
Machine Learning Examples
Applications of Machine Learning
Typical Machine Learning Process
• Training data. This type of data builds up the machine learning algorithm. The
data scientist feeds the algorithm input data, which corresponds to an expected
output. The model evaluates the data repeatedly to learn more about the
data’s behavior and then adjusts itself to serve its intended purpose.
• Validation data. During training, validation data infuses new data into the
model that it hasn’t evaluated before. Validation data provides the first test
against unseen data, allowing data scientists to evaluate how well the model
makes predictions based on the new data. Not all data scientists use validation
data, but it can provide some helpful information to optimize hyper
parameters, which influence how the model assesses data.
• Test data. After the model is built, testing data once again validates that it can
make accurate predictions. If training and validation data include labels to
monitor performance metrics of the model, the testing data should be
unlabeled. Test data provides a final, real-world check of an unseen dataset to
confirm that the ML algorithm was trained effectively.
Typical Machine Learning Process
overfitting
Creating a model that matches the training data so closely that the model fails to make
correct predictions on new data.
Over fitting and train/test splits
Over fitting and train/test splits
Train/Test is a method to measure the accuracy of your model.
• It is called Train/Test because you split the the data set into two sets: a
training set and a testing set. 80% for training, and 20% for testing.
• You train the model using the training set, You test the model using the testing
set.
• Train the model means create the model, Test the model means test the
accuracy of the model.
• Nevertheless, common split percentages include:
Train: 80%, Test: 20%
Train: 67%, Test: 33%
Train: 50%, Test: 50%
Splitting
DataSets
• To use dataset in machine learning the dataset is first split into a training
and test set.
• The training set is used to train the model
• The test set is used to test the accuracy of the model
Splitting of Dataset
•Method 1
•Second Method
Data Imbalance _Overfitting
• If the training data is overly unbalanced, then the model will predict a non
meaningful result
• For example, if a model is a binary classifier (eg. Cat vs Dog) and nearly all
the samples are of the same label (cat) then the model will simply learn
that everything is a that label (Cat).
• This is called Overfitting. To prevent overfitting, there needs to be fairly
equal distribution of training samples for each classification, or range if
label is a real value
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning Techniques
Machine Learning Techniques
Types of Machine Learning
Machine Learning Techniques
• Supervised Machine Learning
• Semi-supervised Machine Learning
• Unsupervised Machine Learning
• Reinforcement Machine Learning
Machine Learning Techniques
Machine learning, tasks are generally classified into broad categories. These
categories are based on how learning is received or how feedback on the learning is
given to the system developed.
•Two of the most widely adopted machine learning methods are supervised learning
which trains algorithms based on example input and output data that is labeled by
humans, and unsupervised learning which provides the algorithm with no labeled
data in order to allow it to find structure within its input data. The semi-supervised
models use both labeled and unlabeled data for training. reinforcement learning has
a feedback type of algorithm (the machine learn on its own)
Contd.,
Contd.,
Supervised Machine Learning
1. It is a type of learning in which both input and desired output data are provided.
2. Input and output data are labeled for classification to provide a learning basis for
future data processing. (A model based on supervised learning would require both
previous data and the previous results as input. By training with this data, the model
helps in predicting results that are more accurate.
3. This algorithm consist of a target / outcome variable (or dependent variable) which
is to be predicted from a given set of predictors (independent variables).
4. Using these set of variables, we generate a function that map inputs to desired
outputs. The training process continues until the model achieves a desired level of
accuracy on the training data
5. Supervised learning has methods like classification, regression, naïve bayes theorem,
SVM, KNN, decision tree, etc.
.
Contd.,
Contd.,
Contd.,
Supervised learning is classified into two categories of algorithms:
Classification: A classification problem is when the output variable is a category,
such as “Red” or “blue” or “disease” and “no disease”.
Regression: A regression problem is when the output variable is a real value,
such as “dollars” or “weight”.
Types:-
Regression
Logistic Regression
Classification
Naive Bayes Classifiers
K-NN (k nearest neighbors)
Decision Trees
Support Vector Machine
Advantages:-
•Supervised learning allows collecting data and produces data output from
previous experiences.
•Helps to optimize performance criteria with the help of experience.
•Supervised machine learning helps to solve various types of real-world
computation problems.
Disadvantages:-
•Classifying big data can be challenging.
•Training for supervised learning needs a lot of computation time. So, it requires a
lot of time.
Contd.,
Unsupervised Machine Learning
1. Unsupervised learning needs no previous data as input. It is the method that allows the
model to learn on its own using the data, which you give. Here, the data is not labelled,
but the algorithm helps the model in forming clusters of similar types of data. For
example, if we have the data of dogs and cats, the model will process and train itself
with the data. Since it has no previous experience of the data, it will form clusters based
on similarities of features.
2. it trains the model by making it learn about the data and work on it from the very start.
Also, after the data is clustered and classified, we can easily label the data in separate
categories as the data is already solved now.
Unsupervised Machine Learning
Unsupervised Machine Learning
Contd.,
Unsupervised learning is classified into two categories of algorithms:
1. Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
2. Association: An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that buy X also
tend to buy Y.
Types of Unsupervised Learning:-
• Clustering
• Exclusive (partitioning)
• Agglomerative
• Overlapping
• Probabilistic
Clustering Types:-
1. Hierarchical clustering
• K-means clustering
• Principal Component Analysis
• Singular Value Decomposition
• Independent Component Analysis
Contd.,
Advantages of Unsupervised Learning
1.we sometimes choose unsupervised learning in place of supervised learning. Here
are some of the advantages:
2.Labeling of data demands a lot of manual work and expenses. Unsupervised
learning solves the problem by learning the data and classifying it without any
labels.
3.The labels can be added after the data has been classified which is much easier.
4. It is very helpful in finding patterns in data, which are not possible to find using
normal methods.
5.Dimensionality reduction can be easily accomplished using unsupervised learning.
6.unsupervised learning can help to understand raw data.
Disadvantages of Unsupervised Learning
•The result might be less accurate as we do not have any input data to train from.
1.The model is learning from raw data without any prior knowledge.
2.It is also a time-consuming process. The learning phase of the algorithm might
take a lot of time, as it analyses and calculates all possibilities.
Contd.,
Contd.,
Semi-supervised Machine Learning
This is a combination of supervised and unsupervised learning. This method helps to reduce
the shortcomings of both the above learning methods.
In supervised learning, labelling of data is manual work and is very costly as data is huge. In
unsupervised learning, the areas of application are very limited. To reduce these problems,
semi-supervised learning is used.
the model first trains under unsupervised learning. This ensures that most of the unlabeled
data divide into clusters. For the remaining unlabeled data, the generation of labels takes
place and classification carries with ease. This technique is very useful in areas like speech
recognition and analysis, protein classification, text classification, etc. This is a type of
hybrid learning problem. (its working lies between Supervised and Unsupervised techniques.
We use these techniques when we are dealing with data that is a little bit labeled and the rest
large portion of it is unlabeled. We can use the unsupervised techniques to predict labels and
then feed these labels to supervised techniques. This technique is mostly applicable in the
case of image data sets where usually all images are not labeled.)
Reinforcement Machine Learning
1. The model keeps on increasing its performance using Reward Feedback to
learn the behavior or pattern. These algorithms are specific to a particular
problem e.g. Google Self Driving car, AlphaGo where a bot competes with
humans and even itself to getting better and better performers of Go Game.
2. Each time we feed in data, they learn and add the data to its knowledge that is
training data. So, the more it learns the better it gets trained and hence
experienced.
Machine Learning_Unit 2_Full.ppt.pdf
Reinforcement Machine Learning
1. Reinforcement Learning is a type of learning methodology in ML along with
supervised and unsupervised learning. But, when we compare these three,
reinforcement learning is a bit different than the other two. Here, we take the
concept of giving rewards for every positive result and make that the base of our
algorithm.
2. We can train our dog to perform certain actions, of course, it won’t be an easy task.
You would order the dog to do certain actions and for every proper execution, you
would give a biscuit as a reward. The dog will remember that if it does a certain
action, it would get biscuits. This way it will follow the instructions properly next
time.
Reinfocement Learning
Types of Data
Learning Appraoch
Applications
Contd.,
What are the most common and popular machine learning algorithms?
1. Naïve Bayes Classifier Algorithm (Supervised Learning - Classification)
2. K Means Clustering Algorithm (Unsupervised Learning - Clustering)
3. Support Vector Machine Algorithm (Supervised Learning - Classification)
4. Linear Regression (Supervised Learning/Regression)
5. Logistic Regression (Supervised learning – Classification)
6. Artificial Neural Networks (Reinforcement Learning)
7. Decision Trees (Supervised Learning – Classification/Regression)
8. Random Forests (Supervised Learning – Classification/Regression)
9. Nearest Neighbours (Supervised Learning)
Classfication and Regression Algorithms
• Definition of Classification
• Definition of Regression
• Differentiate between Classification and Regression
• Types of Classification Algorithms
– Naïve Bayes Algorithms
– K-Nearest Neighbors Algorithms
– Logistic Regression
– support vector machines (SVM)
– Decision Trees
– Random Forest
Classification Errors
Contd.,
Regression and Classification algorithms are Supervised Learning algorithms. Both
the algorithms are used for prediction in Machine learning and work with the
labeled datasets.
The main difference between Regression and Classification algorithms that
Regression algorithms are used to predict the continuous values such as price,
salary, age, etc. and Classification algorithms are used to predict/Classify the
discrete values such as Male or Female, True or False, Spam or Not Spam, etc.
Definition of Classification
Classification:
Classification is a process of finding a function which helps in dividing the dataset into
classes based on different parameters. In Classification, a computer program is
trained on the training dataset and based on that training, it categorizes the data
into different classes. The task of the classification algorithm is to find the mapping
function to map the input(x) to the discrete output(y).
Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different
parameters, and whenever it receives a new email, it identifies whether the email is
spam or not. If the email is spam, then it is moved to the Spam folder.
Contd.,
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the following types:
– Logistic Regression
– K-Nearest Neighbours
– Support Vector Machines
– Naïve Bayes
– Decision Tree Classification
– Random Forest Classification
Definition of Regression
Regression is a process of finding the correlations between
dependent and independent variables. It helps in predicting the
continuous variables such as prediction of Market Trends, prediction
of House prices, etc.
The task of the Regression algorithm is to find the mapping function
to map the input variable(x) to the continuous output variable(y).
Example:
Suppose we want to do weather forecasting, so for this, we will use
the Regression algorithm. In weather prediction, the model is trained
on the past data, and once the training is completed, it can easily
predict the weather for future days.
Contd.,
Types of Regression Algorithm
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
Regression Algorithm Classification Algorithm
In Regression, the output variable must
be of continuous nature or real value.
In Classification, the output variable must
be a discrete value.
The task of the regression algorithm is to
map the input value (x) with the
continuous output variable(y).
The task of the classification algorithm is
to map the input value(x) with the
discrete output variable(y).
Regression Algorithms are used with
continuous data.
Classification Algorithms are used with
discrete data.
In Regression, we try to find the best fit
line, which can predict the output more
accurately.
In Classification, we try to find the
decision boundary, which can divide the
dataset into different classes.
Regression algorithms can be used to
solve the regression problems such as
Weather Prediction, House price
prediction, etc.
Classification Algorithms can be used to
solve classification problems such as
Identification of spam emails, Speech
Recognition, Identification of cancer cells,
etc.
The regression Algorithm can be further
divided into Linear and Non-linear
Regression.
The Classification algorithms can be
divided into Binary Classifier and
Multi-class Classifier.
1. Naïve Bayes Algorithm
• The Naive Bayes classifier works on the principle of conditional probability, as
given by the Bayes theorem. While calculating the math on probability, we
usually denote probability as P.
• The Bayes theorem gives us the conditional probability of event A, given that
event B has occurred. In this case, the first coin toss will be B and the second
coin toss A. This could be confusing because we've reversed the order of them
and go from B to A instead of A to B.
• Bayes theorem calculates the conditional probability of the occurrence of an
event based on prior knowledge of conditions that might be related to the
event.
Contd.,
• Naïve Bayes Classifier is one among the straightforward and best
Classification algorithms which helps in building the fast machine
learning models which will make quick predictions.
• Naive Bayes is one of the powerful machine learning algorithms that is used
for
classification.
• It is an extension of the Bayes theorem wherein each feature
assumes independence. It is used for a variety of tasks such as spam
filtering and other areas of text classification.
Understanding Naive Bayes and Machine Learning
Machine learning falls into two categories:
• Supervised learning and Unsupervised learning
Supervised learning falls into two categories:
• Classification and Regression
Naive Bayes algorithm falls under classification.
Contd.,
Naïve Bayes used for,
• Face Recognition - As a classifier, it is used to identify the faces or its other
features, like nose, mouth, eyes, etc.
• Weather Prediction -It can be used to predict if the weather will be good or
bad.
• Medical Diagnosis - Doctors can diagnose patients by using the information
that the classifier provides. Healthcare professionals can use Naive Bayes to
indicate if a patient is at high risk for certain diseases and conditions, such as
heart disease, cancer, and other ailments.
• News Classification
But it will require less training data
Contd.,
Example
Under the day, look for variables, like weekday, weekend, and holiday. For any given
day, check if there are a discount and free delivery. Based on this information, we
can predict if a customer would buy the product or not.
• See a small sample data set of 30 rows, with 15 of them, as shown below:
Machine Learning_Unit 2_Full.ppt.pdf
Contd.,
Based on the dataset containing the three input types—day, discount, and free
delivery— the frequency table for each attribute is populated.
Contd.,
Contd.,
Machine Learning_Unit 2_Full.ppt.pdf
Contd.,
Contd.,
Contd.,
Contd.,
Contd.,
Contd.,
2. K-Nearest Neighbors
The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised
machine learning algorithm that can be used to solve both classification and
regression problems. A supervised machine learning algorithm (as opposed to an
unsupervised machine learning algorithm) is one that relies on labeled input data
to learn a function that produces an appropriate output when given new unlabeled
data.
The KNN algorithm assumes that similar things exist in close proximity. In other
words, similar things are near to each other.
•K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
•K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
Contd.,
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories.
To solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset.
K-Nearest Neighbor is a classification and prediction algorithm that is used to
divide data into classes based on the distance between the data points. K-Nearest
Neighbor assumes that data points which are close to one another must be similar
and hence, the data point to be classified will be grouped with the closest cluster.
Contd.,
Contd.,
Contd.,
57 KG, 170 CM IS NORMAL
3. Logistic Regression
Logistic Regression is a Machine Learning classification algorithm that is used to
predict the probability of a categorical dependent variable. In logistic regression,
the dependent variable is a binary variable(result in binary format) that contains
data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
•It is a technique to analyse a data-set which has a dependent variable and one or
more independent variables to predict the outcome in a binary variable,
meaning it will have only two outcomes.
•The dependent variable is categorical in nature. Dependent variable is also
referred as target variable and the independent variables are called
the predictors
Logistic regression is a supervised learning algorithm used to predict a
dependent categorical target variable but is used to classify samples; Therefore,
it falls under the classification algorithm.
.
Contd.,
Linear Regression vs Logistic Regression
• Both are supervised learning models and make use of labeled data for
making predictions.
• Linear regression is used for regression(prediction)problems
whereas Logistic regression can be used in both classification and regression
problems but is widely used as a classification algorithm
• But the main difference between them is how they are being used. The Linear
Regression is used for solving Regression problems whereas Logistic
Regression is used for solving the Classification problems. The description of
both the algorithms is given below along with difference table.
Contd.,
Type of Logistic Regression:
On the basis of the Dependent variable, Logistic Regression can be classified into three
types:
• Binomial: There can be only two possible types of the dependent variables, such as 0
or 1, Pass or Fail, Purchased or Not Purchased, Tall or Short, Fat or Slim, Rock or Mine,
etc.
• Multinomial: There can be 3 or more possible unordered types of the dependent
variable, such as apple, banana, orange or cat, dog, goat, sheep or Delhi, Mumbai,
Bangalore, Calcutta.
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as High, medium, low, or ratings of a restaurant from 1
to 5 or the intensity of the light, or a 5 points Likert scale, etc
Contd.,
Linear Regression Logistic Regression
Linear regression is used to predict the
continuous dependent variable using a given
set of independent variables.
Logistic Regression is used to predict the
categorical dependent variable using a given
set of independent variables.
Linear Regression is used for solving
Regression problem.
Logistic regression is used for solving
Classification problems.
In Linear regression, we predict the value of
continuous variables.
In logistic Regression, we predict the values
of categorical variables.
In linear regression, we find the best fit line,
by which we can easily predict the output.
In Logistic Regression, we find the S-curve by
which we can classify the samples.
Least square estimation method is used for
estimation of accuracy.
Maximum likelihood estimation method is
used for estimation of accuracy.
The output for Linear Regression must be a
continuous value, such as price, age, etc.
The output of Logistic Regression must be a
Categorical value such as 0 or 1, Yes or No,
etc.
In Linear regression, it is required that
relationship between dependent variable and
independent variable must be linear.
In Logistic regression, it is not required to
have the linear relationship between the
dependent and independent variable.
In linear regression, there may be collinearity
between the independent variables.
In logistic regression, there should not be
collinearity between the independent
variable.
4. Support Vector Machines (SVM)
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.
•They build upon basic ML algorithms and add features that make them more efficient at
various tasks. It can be used in a variety of tasks, including anomaly detection, handwriting
recognition, and text classification. Because of their flexibility, high performance, and
compute efficiency,
Contd.,
Contd.,
Contd.,
Example: SVM can be understood with the example that we have used in the KNN
classifier. Suppose we see a strange cat that also has some features of dogs, so if
we want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm. We will first train our model
with lots of images of cats and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange creature. So as support vector
creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the
basis of the support vectors, it will classify it as a cat.
Contd.,
SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
Linear Separable Data Non-Linear Separable Data
Contd.,
• When we can easily separate data with hyperplane by drawing a straight line is
Linear SVM. When we cannot separate data with a straight line we use Non –
Linear SVM. In this, we have Kernel functions. They transform non-linear
spaces into linear spaces. It transforms data into another dimension so that the
data can be classified.
• It transforms two variables x and y into three variables along with z. Therefore,
the data have plotted from 2-D space to 3-D space. Now we can easily classify
the data by drawing the best hyperplane between them.
Linear SVM Non-Linear SVM
It can be easily separated with a linear
line.
It cannot be easily separated with a
linear line.
Data is classified with the help of
hyperplane.
We use Kernels to make
non-separable data into separable
data.
Data can be easily classified by
drawing a straight line.
We map data into high dimensional
space to classify.
Linear SVM vs Non-Linear SVM
Contd.,
Contd.,
The working of the SVM algorithm (LINEAR SVM)
1.Suppose we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in
either green or blue
2.So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes.
3.Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane.
4.SVM algorithm finds the closest point of the lines from both the classes. These points are
called support vectors.
5.The distance between the vectors and the hyperplane is called as margin. And the goal of
SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Contd.,
Non-Linear SVM:
1.If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line.
2. So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third dimension
z. It can be calculated as: z=x2
+y2,
By adding the third dimension, the sample space will
become as
Contd.,
3. So now, SVM will divide the datasets into classes in the following way.
4. Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as: we get a circumference of radius
1 in case of non-linear data.
Contd.,
Advantages of SVM
• Good for smaller cleaner datasets.
• Accurate results.
• Useful for both linearly separable data and non – linearly separable data.
• Effective in high dimensional spaces.
Disadvantages of SVM
• Not suitable for large datasets, as the training time can be too much.
• Not so effective on a dataset with overlapping classes.
• Picking the right kernel can be computationally intensive.
Applications of SVM
• Sentiment analysis.
• Spam Detection.
• Handwritten digit recognition.
• Image recognition challenges
5. Decision Trees
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any
further branches.
• The decisions or the test are performed on the basis of features of the given
dataset.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
Contd.,
A decision tree can contain categorical data (YES/NO) as well as numeric data.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
Contd.,
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to select the
best attribute for the root node and for sub-nodes. So, to solve such problems
there is a technique which is called as Attribute selection measure or ASM. By
this measurement, we can easily select the best attribute for the nodes of the
tree. There are two popular techniques for ASM,
• Information Gain
• Gini Index
1. Information Gain:
• Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the
decision tree.
• A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
Contd.,
Entropy: Entropy is a metric to measure the impurity in a given attribute. It
specifies randomness in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
– S= Total number of samples
– P(yes)= probability of yes
– P(no)= probability of no
2. Gini Index:
•Gini index is a measure of impurity or purity used while creating a decision tree in
the CART(Classification and Regression Tree) algorithm.
•An attribute with the low Gini index should be preferred as compared to the high
Gini index.
•It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
•Gini index can be calculated using the below formula:
Gini Index= 1- ∑j
Pj
2
DataSet
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Contd.,
Advantages of Decision Tree
1. Clear Visualization
2. Simple and easy to understand
3. Decision Tree can be used for both classification and regression problems.
4. Decision Tree can handle both continuous and categorical variables.
5. No feature scaling required
6. Handles non-linear parameters efficiently
7. Decision Tree can automatically handle missing values.
8. Decision Tree is usually robust to outliers and can handle them automatically.
9. Less Training Period
Disadvantages of Decision Tree
1. Overfitting
2. High variance
3. Unstable
4. Affected by noise
5. Not suitable for large datasets
6. Random Forest
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
• "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
Random forest is a commonly-used machine learning algorithm trademarked by
Leo Breiman and Adele Cutler, which combines the output of multiple decision
trees to reach a single result. Its ease of use and flexibility have fueled its
adoption, as it handles both classification and regression problems.
Contd.,
Contd.,
• Random forest algorithms have three main hyperparameters, which need to be
set before training.
• These include node size, the number of trees, and the number of features
sampled. From there, the random forest classifier can be used to solve for
regression or classification problems.
• The random forest algorithm is made up of a collection of decision trees, and
each tree in the ensemble is comprised of a data sample drawn from a training
set with replacement, called the bootstrap sample. Of that training sample,
one-third of it is set aside as test data, known as the out-of-bag (oob) sample,
which we’ll come back to later. Another instance of randomness is then injected
through feature bagging, adding more diversity to the dataset and reducing the
correlation among decision trees. Depending on the type of problem, the
determination of the prediction will vary. For a regression task, the individual
decision trees will be averaged, and for a classification task, a majority vote—i.e.
the most frequent categorical variable—will yield the predicted class. Finally, the
oob sample is then used for cross-validation, finalizing that prediction.
Contd.,
a) Working of Random Forest Algorithm
Before understanding the working of the random forest we must look into the
ensemble technique.
Ensemble Learning
1.means Groups
2.In Ensemble learning individual models come together and bring forth a model that
is more accurate )
simply means combining multiple models. Thus a collection of models is used to
make predictions rather than an individual model.
Why use Ensemble Models:
1.Better accuracy (low error)
2.Higher consistency (avoid overfitting)
•Reduce bias and variance errors
Contd.,
Ensemble uses two types of methods:
1. Bagging– It creates a different training subset from sample training data with
replacement & the final output is based on majority voting. For example, Random
Forest.
2. Boosting– It combines weak learners into strong learners by creating sequential
models such that the final model has the highest accuracy. For example, ADA
BOOST, XG BOOST
Random Forest Algorithm working on Bagging principle
Contd.,
1. Bagging
Bagging, also known as Bootstrap Aggregation is the ensemble technique used by
random forest. Bagging chooses a random sample from the data set. Hence each
model is generated from the samples (Bootstrap Samples) provided by the Original
Data with replacement known as row sampling. This step of row sampling with
replacement is called bootstrap. Now each model is trained independently which
generates results. The final output is based on majority voting after combining the
results of all models. This step which involves combining all the results and
generating output based on majority voting is known as aggregation.
Contd.,
Bagging – various models are built in parallel on various samples and then the Various
models vote to give the final model and hence prediction
Contd.,
2. Boosting – (it is a process that uses a set of machine learning algorithms to combine
weak learner to form strong learners in order to increase the accuracy of the model)
Boosting is an ensemble modeling technique that attempts to build a
1. strong classifier from the number of weak classifiers. It is done by building a model by
using weak models in series. Firstly, a model is built from the training data.
2. Then the second model is built which tries to correct the errors present in the first
model. This procedure is continued and models are added until either the complete
training data set is predicted correctly or the maximum number of models are added.
1. Little variation on bagging
2. Selecting points which give wrong predictions
Contd.,
How Does Boosting Algorithm Work
The basic principle behind the working of the boosting algorithm is to generate
multiple weak learners and combine their predictions to form one strong rule
Decision stumps – is nothing but , single level decision tree, that tries to
classify the data points, and then equal weightage given to all data points
Contd.,
Key Benefits
•Reduced risk of overfitting
• Provides flexibility
•Easy to determine feature importance
Key Challenges
•Time-consuming process
•Requires more resources
•More complex
Advantages and Disadvantages of Random Forest
•It reduces overfitting in decision trees and helps to improve the accuracy
•It is flexible to both classification and regression problems
•It works well with both categorical and continuous values
•It automates missing values present in the data
•Normalising of data is not required as it uses a rule-based approach.
However, despite these advantages, a random forest algorithm also has some drawbacks.
•It requires much computational power as well as resources as it builds numerous trees to
combine their outputs.
•It also requires much time for training as it combines a lot of decision trees to determine the
class.
•Due to the ensemble of decision trees, it also suffers interpretability and fails to determine
the significance of each variable.
Contd.,
S.N
O
Bagging Boosting
1.
The simplest way of combining
predictions that belong to the same
type.
A way of combining predictions that
belong to the different types.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
3. Each model receives equal weight.
Models are weighted according to their
performance.
4. Each model is built independently.
New models are influenced by the
performance of previously built models.
5.
Different training data subsets are
randomly drawn with replacement from
the entire training dataset.
Every new subset contains the elements
that were misclassified by previous
models.
6.
Bagging tries to solve the over-fitting
problem.
Boosting tries to reduce bias.
7.
If the classifier is unstable (high
variance), then apply bagging.
If the classifier is stable and simple (high
bias) the apply boosting.
8.
Example: The Random forest model uses
Bagging.
Example: The Ada Boost uses Boosting
techniques
Contd.,
Decision trees Random Forest
1. Decision trees normally suffer from the
problem of overfitting if it’s allowed to
grow without any control.
1. Random forests are created from
subsets of data and the final output is
based on average or majority ranking and
hence the problem of overfitting is taken
care of.
2. A single decision tree is faster in
computation.
2. It is comparatively slower.
3. When a data set with features is taken
as input by a decision tree it will formulate
some set of rules to do prediction.
3. Random forest randomly selects
observations, builds a decision tree and
the average result is taken. It doesn’t use
any set of formulas.
1 de 135

Recomendados

ML_Module_1.pdf por
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdfJafarHussain48
2 visualizações25 slides
Machine Learning by Rj por
Machine Learning by RjMachine Learning by Rj
Machine Learning by RjShree M.L.Kakadiya MCA mahila college, Amreli
387 visualizações161 slides
Machine Learning Contents.pptx por
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptxNaveenkushwaha18
13 visualizações83 slides
introduction to machine learning por
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
181 visualizações30 slides
Machine learning por
Machine learningMachine learning
Machine learningTushar Nikam
215 visualizações10 slides
Supervised learning techniques and applications por
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
8 visualizações31 slides

Mais conteúdo relacionado

Similar a Machine Learning_Unit 2_Full.ppt.pdf

Machine Learning.pptx por
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptxNitinSharma134320
21 visualizações18 slides
machine learning.docx por
machine learning.docxmachine learning.docx
machine learning.docxJadhavArjun2
43 visualizações11 slides
An Introduction to Machine Learning por
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningVedaj Padman
214 visualizações14 slides
Machine Learning Basics por
Machine Learning BasicsMachine Learning Basics
Machine Learning BasicsSuresh Arora
736 visualizações31 slides
detailed Presentation on supervised learning por
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learningZAMANCHBWN
176 visualizações10 slides
Machine Learning Ch 1.ppt por
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.pptARVIND SARDAR
19 visualizações78 slides

Similar a Machine Learning_Unit 2_Full.ppt.pdf(20)

Machine Learning.pptx por NitinSharma134320
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma13432021 visualizações
machine learning.docx por JadhavArjun2
machine learning.docxmachine learning.docx
machine learning.docx
JadhavArjun243 visualizações
An Introduction to Machine Learning por Vedaj Padman
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
Vedaj Padman214 visualizações
Machine Learning Basics por Suresh Arora
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
Suresh Arora736 visualizações
detailed Presentation on supervised learning por ZAMANCHBWN
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
ZAMANCHBWN176 visualizações
Machine Learning Ch 1.ppt por ARVIND SARDAR
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
ARVIND SARDAR19 visualizações
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env... por Intel® Software
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Intel® Software568 visualizações
machine learning edited.docx por levisNjoroge
machine learning edited.docxmachine learning edited.docx
machine learning edited.docx
levisNjoroge14 visualizações
machine learning edited.docx por LevisMnjoro
machine learning edited.docxmachine learning edited.docx
machine learning edited.docx
LevisMnjoro14 visualizações
Eckovation Machine Learning por Shikhar Srivastava
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
Shikhar Srivastava1.9K visualizações
Machine Learning Tutorial for Beginners por grinu
Machine Learning Tutorial for BeginnersMachine Learning Tutorial for Beginners
Machine Learning Tutorial for Beginners
grinu134 visualizações
Intro/Overview on Machine Learning Presentation -2 por Ankit Gupta
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta616 visualizações
Introduction to Machine Learning por Sujith Jayaprakash
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Sujith Jayaprakash416 visualizações
How to build machine learning apps.pdf por JamieDornan2
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
JamieDornan223 visualizações
Chapter 05 Machine Learning.pptx por ssuser957b41
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptx
ssuser957b416 visualizações
AI PROJECT CYCLE.pptx por Jitendra Kumar Yadav
AI PROJECT CYCLE.pptxAI PROJECT CYCLE.pptx
AI PROJECT CYCLE.pptx
Jitendra Kumar Yadav6.3K visualizações
Machine Learning Landscape por Eng Teong Cheah
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
Eng Teong Cheah2K visualizações
How to build machine learning apps.pdf por StephenAmell4
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
StephenAmell427 visualizações
How to build machine learning apps.pdf por AnastasiaSteele10
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
AnastasiaSteele1018 visualizações
How to build machine learning apps.pdf por JamieDornan2
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
JamieDornan23 visualizações

Último

Machine Element II Course outline.pdf por
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdfodatadese1
9 visualizações2 slides
Final Year Presentation por
Final Year PresentationFinal Year Presentation
Final Year PresentationComsat Universal Islamabad Wah Campus
7 visualizações29 slides
What is Whirling Hygrometer.pdf por
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdfIIT KHARAGPUR
12 visualizações3 slides
MK__Cert.pdf por
MK__Cert.pdfMK__Cert.pdf
MK__Cert.pdfHassan Khan
10 visualizações1 slide
Codes and Conventions.pptx por
Codes and Conventions.pptxCodes and Conventions.pptx
Codes and Conventions.pptxIsabellaGraceAnkers
8 visualizações5 slides
Proposal Presentation.pptx por
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptxkeytonallamon
29 visualizações36 slides

Último(20)

Machine Element II Course outline.pdf por odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese19 visualizações
What is Whirling Hygrometer.pdf por IIT KHARAGPUR
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdf
IIT KHARAGPUR 12 visualizações
MK__Cert.pdf por Hassan Khan
MK__Cert.pdfMK__Cert.pdf
MK__Cert.pdf
Hassan Khan10 visualizações
Codes and Conventions.pptx por IsabellaGraceAnkers
Codes and Conventions.pptxCodes and Conventions.pptx
Codes and Conventions.pptx
IsabellaGraceAnkers8 visualizações
Proposal Presentation.pptx por keytonallamon
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptx
keytonallamon29 visualizações
Saikat Chakraborty Java Oracle Certificate.pdf por SaikatChakraborty787148
Saikat Chakraborty Java Oracle Certificate.pdfSaikat Chakraborty Java Oracle Certificate.pdf
Saikat Chakraborty Java Oracle Certificate.pdf
SaikatChakraborty78714815 visualizações
Effect of deep chemical mixing columns on properties of surrounding soft clay... por AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 visualizações
Advances in micro milling: From tool fabrication to process outcomes por Shivendra Nandan
Advances in micro milling: From tool fabrication to process outcomesAdvances in micro milling: From tool fabrication to process outcomes
Advances in micro milling: From tool fabrication to process outcomes
Shivendra Nandan7 visualizações
K8S Roadmap.pdf por MaryamTavakkoli2
K8S Roadmap.pdfK8S Roadmap.pdf
K8S Roadmap.pdf
MaryamTavakkoli26 visualizações
CHEMICAL KINETICS.pdf por AguedaGutirrez
CHEMICAL KINETICS.pdfCHEMICAL KINETICS.pdf
CHEMICAL KINETICS.pdf
AguedaGutirrez12 visualizações
Instrumentation & Control Lab Manual.pdf por NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 visualizações
fakenews_DBDA_Mar23.pptx por deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra814 visualizações
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) por Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Tsuyoshi Horigome28 visualizações
Digital Watermarking Of Audio Signals.pptx por AyushJaiswal781174
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptx
AyushJaiswal78117412 visualizações
sam_software_eng_cv.pdf por sammyigbinovia
sam_software_eng_cv.pdfsam_software_eng_cv.pdf
sam_software_eng_cv.pdf
sammyigbinovia5 visualizações
_MAKRIADI-FOTEINI_diploma thesis.pptx por fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 visualizações
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... por AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
AltinKaradagli9 visualizações

Machine Learning_Unit 2_Full.ppt.pdf

  • 1. MACHINE LEARNING AND DATA SCIENCES III-B.Tech.-II-Sem Subject Code: CS-PCC-322 Unit-II: Machine Learning 10 hours Overview of Machine learning concepts – Over fitting and train/test splits, Types of Machine learning – Supervised, Unsupervised, Reinforced learning, Introduction to Bayes Theorem, Linear Regression- model assumptions, regularization (lasso, ridge, elastic net), Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic regression, support vector machines (SVM), decision trees, and random forest, Classification Errors.. Dr.S.Dhanalakshmi
  • 2. Introduction to Machine Learning (Definition) Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning generally is to understand the structure of data and fit that data into models that can be understood and utilized by people. •Machine Learning is the most popular technique of predicting the future or classifying information to help people in making necessary decisions. •Machine Learning algorithms are trained over instances or examples through which they learn from past experiences and also analyze the historical data. •The whole concept of machine learning is figuring out ways in which we can teach a computer to perform a task without a need to provide explicit instructions. •The term machine learning was coined in 1959 by Arthur SamuelThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMerThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gamingThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gaming and artificial intelligence. •Machine learning, which deals with the information world. Machines use data to learn, and machine learning aims to derive meaning from that data. Machine learning uses statistical methods to enable machines to improve with machines. A subset of machine learning is deep learning, which enables multi-layer neural networks.
  • 5. Overview of Machine Learning Concepts AI is the greater pool that contains an amalgamation of all (AI enables machines to think without any human intervention) Machine Learning is a part of Artificial Intelligence that involves implementing algorithms that are able to learn from the data or previous instances and are able to perform tasks without explicit instructions. (subset of AI that uses statistical learning algorithms that learn pattern in data over time) Deep learning is a component of a broader family of machine learning methods supported artificial neural networks with representation learning. (subset of ML that filters the data through multiple layers)
  • 6. Essential Components of Machine Learning • Representation (what the model looks like) • Evaluation (how do we differentiate good models from bad ones) • Optimization ( what is our process for finding the good models among all the possible models)
  • 9. TRADITIONAL PROGRAMMING VS MACHINE LEARNING Traditional Programming Traditional programming is a manual process — meaning a person (programmer) creates the program. But without anyone programming the logic, one has to manually formulate or code rules. We have the input data, and someone (programmer) coded a program that uses that data and runs on a computer to produce the desired output. Machine Learning Machine Learning, on the other hand, the input data and output are fed to an algorithm to create a program. In Traditional programming one has to manually formulate/code rules while in Machine Learning the algorithms automatically formulate the rules from the data, which is very powerful. .
  • 11. Terminology in Machine Learning Model: Also known as “hypothesis”, a machine learning model is the mathematical representation of a real-world process. A machine learning algorithm along with the training data builds a machine learning model. Feature: A feature is a measurable property or parameter of the data-set. Feature Vector: It is a set of multiple numeric features. We use it as an input to the machine learning model for training and prediction purposes. Training: An algorithm takes a set of data known as “training data” as input. The learning algorithm finds patterns in the input data and trains the model for expected results (target). The output of the training process is the machine learning model. Prediction: Once the machine learning model is ready, it can be fed with input data to provide a predicted output. Target (Label): The value that the machine learning model has to predict is called the target or label. Overfitting: When a massive amount of data trains a machine learning model, it tends to learn from the noise and inaccurate data entries. Here the model fails to characterize the data correctly. Underfitting: It is the scenario when the model fails to decipher the underlying trend in the input data. It destroys the accuracy of the machine learning model. In simple terms, the model or the algorithm does not fit the data well enough
  • 12. How Does Machine Learning Works
  • 13. Steps to Build for ML Model
  • 14. Contd., 1. Data collection Machine learning requires training data, a lot of it (either labelled, meaning supervised learning or not labelled, meaning unsupervised learning). 2. Data preparation Raw data alone is not very useful. The data needs to be prepared, normalized, de-duplicated and errors and bias need to be removed. Visualisation of the data can be used to look for patterns and outliers to see if the right data has been collected or if data is missing. Cleaning and Visualizing Data
  • 15. Contd., 3. Choosing a model Based on the collected data with relevant to the task choose a model. Its mainly used for various models, linear regression, logistic regression, decision trees, K-means, principal component analysis (PCA), Support Vector Machines (SVM), Naïve Bayes, Random Forest and Neural Networks. If your model is suited for numerical or categorical data and choose accordingly. Model Applications Logistic Regression Price prediction Fully connected networks Classification Convolutional Neural Networks Image processing Recurrent Neural Networks Voice recognition Random Forest Fraud Detection Reinforcement Learning Learning by trial and error Generative Models Image creation K-means Segmentation k-Nearest Neighbors Recommendation systems Bayesian Classifiers Spam and noise filtering
  • 16. Contd., 4. Training Training is the most important step in machine learning. In training, you pass the prepared data to your machine learning model to find patterns and make predictions. It results in the model learning from the data so that it can accomplish the task set. Over time, with training, the model gets better at predicting. 5. Evaluation Evaluating a model After training the model comes evaluating the model. This entails testing the machine learning against an unused control dataset to see how it performs. This might be representative of how the model works in the real world, but this does not have to be the case. The larger the number of variables in the real world, the bigger to training and test data should be.
  • 17. Contd., 6. Parameter tuning After evaluating your model, you should test the originally set parameters to improve the AI. Increasing the number of training cycles can lead to more accurate results. However, you should define when a model is good enough as otherwise, you will continue to tweak the model. This is an experimental process. 7. Prediction Once you have gone through the process of collecting data, preparing the data, selecting the model, training and evaluating the model and tuning the parameters, it is time to answer questions using predictions. These can be all kinds of predictions, ranging from image recognition to semantics to predictive analytics.
  • 20. Typical Machine Learning Process • Training data. This type of data builds up the machine learning algorithm. The data scientist feeds the algorithm input data, which corresponds to an expected output. The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose. • Validation data. During training, validation data infuses new data into the model that it hasn’t evaluated before. Validation data provides the first test against unseen data, allowing data scientists to evaluate how well the model makes predictions based on the new data. Not all data scientists use validation data, but it can provide some helpful information to optimize hyper parameters, which influence how the model assesses data. • Test data. After the model is built, testing data once again validates that it can make accurate predictions. If training and validation data include labels to monitor performance metrics of the model, the testing data should be unlabeled. Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively.
  • 21. Typical Machine Learning Process overfitting Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.
  • 22. Over fitting and train/test splits
  • 23. Over fitting and train/test splits Train/Test is a method to measure the accuracy of your model. • It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. • You train the model using the training set, You test the model using the testing set. • Train the model means create the model, Test the model means test the accuracy of the model. • Nevertheless, common split percentages include: Train: 80%, Test: 20% Train: 67%, Test: 33% Train: 50%, Test: 50%
  • 24. Splitting DataSets • To use dataset in machine learning the dataset is first split into a training and test set. • The training set is used to train the model • The test set is used to test the accuracy of the model
  • 27. Data Imbalance _Overfitting • If the training data is overly unbalanced, then the model will predict a non meaningful result • For example, if a model is a binary classifier (eg. Cat vs Dog) and nearly all the samples are of the same label (cat) then the model will simply learn that everything is a that label (Cat). • This is called Overfitting. To prevent overfitting, there needs to be fairly equal distribution of training samples for each classification, or range if label is a real value
  • 31. Types of Machine Learning Machine Learning Techniques • Supervised Machine Learning • Semi-supervised Machine Learning • Unsupervised Machine Learning • Reinforcement Machine Learning
  • 32. Machine Learning Techniques Machine learning, tasks are generally classified into broad categories. These categories are based on how learning is received or how feedback on the learning is given to the system developed. •Two of the most widely adopted machine learning methods are supervised learning which trains algorithms based on example input and output data that is labeled by humans, and unsupervised learning which provides the algorithm with no labeled data in order to allow it to find structure within its input data. The semi-supervised models use both labeled and unlabeled data for training. reinforcement learning has a feedback type of algorithm (the machine learn on its own)
  • 35. Supervised Machine Learning 1. It is a type of learning in which both input and desired output data are provided. 2. Input and output data are labeled for classification to provide a learning basis for future data processing. (A model based on supervised learning would require both previous data and the previous results as input. By training with this data, the model helps in predicting results that are more accurate. 3. This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). 4. Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data 5. Supervised learning has methods like classification, regression, naïve bayes theorem, SVM, KNN, decision tree, etc. .
  • 38. Contd., Supervised learning is classified into two categories of algorithms: Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”. Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”. Types:- Regression Logistic Regression Classification Naive Bayes Classifiers K-NN (k nearest neighbors) Decision Trees Support Vector Machine Advantages:- •Supervised learning allows collecting data and produces data output from previous experiences. •Helps to optimize performance criteria with the help of experience. •Supervised machine learning helps to solve various types of real-world computation problems. Disadvantages:- •Classifying big data can be challenging. •Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
  • 40. Unsupervised Machine Learning 1. Unsupervised learning needs no previous data as input. It is the method that allows the model to learn on its own using the data, which you give. Here, the data is not labelled, but the algorithm helps the model in forming clusters of similar types of data. For example, if we have the data of dogs and cats, the model will process and train itself with the data. Since it has no previous experience of the data, it will form clusters based on similarities of features. 2. it trains the model by making it learn about the data and work on it from the very start. Also, after the data is clustered and classified, we can easily label the data in separate categories as the data is already solved now.
  • 43. Contd., Unsupervised learning is classified into two categories of algorithms: 1. Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. 2. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y. Types of Unsupervised Learning:- • Clustering • Exclusive (partitioning) • Agglomerative • Overlapping • Probabilistic Clustering Types:- 1. Hierarchical clustering • K-means clustering • Principal Component Analysis • Singular Value Decomposition • Independent Component Analysis
  • 44. Contd., Advantages of Unsupervised Learning 1.we sometimes choose unsupervised learning in place of supervised learning. Here are some of the advantages: 2.Labeling of data demands a lot of manual work and expenses. Unsupervised learning solves the problem by learning the data and classifying it without any labels. 3.The labels can be added after the data has been classified which is much easier. 4. It is very helpful in finding patterns in data, which are not possible to find using normal methods. 5.Dimensionality reduction can be easily accomplished using unsupervised learning. 6.unsupervised learning can help to understand raw data. Disadvantages of Unsupervised Learning •The result might be less accurate as we do not have any input data to train from. 1.The model is learning from raw data without any prior knowledge. 2.It is also a time-consuming process. The learning phase of the algorithm might take a lot of time, as it analyses and calculates all possibilities.
  • 47. Semi-supervised Machine Learning This is a combination of supervised and unsupervised learning. This method helps to reduce the shortcomings of both the above learning methods. In supervised learning, labelling of data is manual work and is very costly as data is huge. In unsupervised learning, the areas of application are very limited. To reduce these problems, semi-supervised learning is used. the model first trains under unsupervised learning. This ensures that most of the unlabeled data divide into clusters. For the remaining unlabeled data, the generation of labels takes place and classification carries with ease. This technique is very useful in areas like speech recognition and analysis, protein classification, text classification, etc. This is a type of hybrid learning problem. (its working lies between Supervised and Unsupervised techniques. We use these techniques when we are dealing with data that is a little bit labeled and the rest large portion of it is unlabeled. We can use the unsupervised techniques to predict labels and then feed these labels to supervised techniques. This technique is mostly applicable in the case of image data sets where usually all images are not labeled.)
  • 48. Reinforcement Machine Learning 1. The model keeps on increasing its performance using Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and even itself to getting better and better performers of Go Game. 2. Each time we feed in data, they learn and add the data to its knowledge that is training data. So, the more it learns the better it gets trained and hence experienced.
  • 50. Reinforcement Machine Learning 1. Reinforcement Learning is a type of learning methodology in ML along with supervised and unsupervised learning. But, when we compare these three, reinforcement learning is a bit different than the other two. Here, we take the concept of giving rewards for every positive result and make that the base of our algorithm. 2. We can train our dog to perform certain actions, of course, it won’t be an easy task. You would order the dog to do certain actions and for every proper execution, you would give a biscuit as a reward. The dog will remember that if it does a certain action, it would get biscuits. This way it will follow the instructions properly next time.
  • 56. What are the most common and popular machine learning algorithms? 1. Naïve Bayes Classifier Algorithm (Supervised Learning - Classification) 2. K Means Clustering Algorithm (Unsupervised Learning - Clustering) 3. Support Vector Machine Algorithm (Supervised Learning - Classification) 4. Linear Regression (Supervised Learning/Regression) 5. Logistic Regression (Supervised learning – Classification) 6. Artificial Neural Networks (Reinforcement Learning) 7. Decision Trees (Supervised Learning – Classification/Regression) 8. Random Forests (Supervised Learning – Classification/Regression) 9. Nearest Neighbours (Supervised Learning)
  • 57. Classfication and Regression Algorithms • Definition of Classification • Definition of Regression • Differentiate between Classification and Regression • Types of Classification Algorithms – Naïve Bayes Algorithms – K-Nearest Neighbors Algorithms – Logistic Regression – support vector machines (SVM) – Decision Trees – Random Forest Classification Errors
  • 58. Contd., Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. The main difference between Regression and Classification algorithms that Regression algorithms are used to predict the continuous values such as price, salary, age, etc. and Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, etc.
  • 59. Definition of Classification Classification: Classification is a process of finding a function which helps in dividing the dataset into classes based on different parameters. In Classification, a computer program is trained on the training dataset and based on that training, it categorizes the data into different classes. The task of the classification algorithm is to find the mapping function to map the input(x) to the discrete output(y). Example: The best example to understand the Classification problem is Email Spam Detection. The model is trained on the basis of millions of emails on different parameters, and whenever it receives a new email, it identifies whether the email is spam or not. If the email is spam, then it is moved to the Spam folder.
  • 60. Contd., Types of ML Classification Algorithms: Classification Algorithms can be further divided into the following types: – Logistic Regression – K-Nearest Neighbours – Support Vector Machines – Naïve Bayes – Decision Tree Classification – Random Forest Classification
  • 61. Definition of Regression Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc. The task of the Regression algorithm is to find the mapping function to map the input variable(x) to the continuous output variable(y). Example: Suppose we want to do weather forecasting, so for this, we will use the Regression algorithm. In weather prediction, the model is trained on the past data, and once the training is completed, it can easily predict the weather for future days.
  • 62. Contd., Types of Regression Algorithm • Simple Linear Regression • Multiple Linear Regression • Polynomial Regression • Support Vector Regression • Decision Tree Regression • Random Forest Regression
  • 63. Regression Algorithm Classification Algorithm In Regression, the output variable must be of continuous nature or real value. In Classification, the output variable must be a discrete value. The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). Regression Algorithms are used with continuous data. Classification Algorithms are used with discrete data. In Regression, we try to find the best fit line, which can predict the output more accurately. In Classification, we try to find the decision boundary, which can divide the dataset into different classes. Regression algorithms can be used to solve the regression problems such as Weather Prediction, House price prediction, etc. Classification Algorithms can be used to solve classification problems such as Identification of spam emails, Speech Recognition, Identification of cancer cells, etc. The regression Algorithm can be further divided into Linear and Non-linear Regression. The Classification algorithms can be divided into Binary Classifier and Multi-class Classifier.
  • 64. 1. Naïve Bayes Algorithm • The Naive Bayes classifier works on the principle of conditional probability, as given by the Bayes theorem. While calculating the math on probability, we usually denote probability as P. • The Bayes theorem gives us the conditional probability of event A, given that event B has occurred. In this case, the first coin toss will be B and the second coin toss A. This could be confusing because we've reversed the order of them and go from B to A instead of A to B. • Bayes theorem calculates the conditional probability of the occurrence of an event based on prior knowledge of conditions that might be related to the event.
  • 65. Contd., • Naïve Bayes Classifier is one among the straightforward and best Classification algorithms which helps in building the fast machine learning models which will make quick predictions. • Naive Bayes is one of the powerful machine learning algorithms that is used for classification. • It is an extension of the Bayes theorem wherein each feature assumes independence. It is used for a variety of tasks such as spam filtering and other areas of text classification. Understanding Naive Bayes and Machine Learning Machine learning falls into two categories: • Supervised learning and Unsupervised learning Supervised learning falls into two categories: • Classification and Regression Naive Bayes algorithm falls under classification.
  • 66. Contd., Naïve Bayes used for, • Face Recognition - As a classifier, it is used to identify the faces or its other features, like nose, mouth, eyes, etc. • Weather Prediction -It can be used to predict if the weather will be good or bad. • Medical Diagnosis - Doctors can diagnose patients by using the information that the classifier provides. Healthcare professionals can use Naive Bayes to indicate if a patient is at high risk for certain diseases and conditions, such as heart disease, cancer, and other ailments. • News Classification But it will require less training data
  • 67. Contd., Example Under the day, look for variables, like weekday, weekend, and holiday. For any given day, check if there are a discount and free delivery. Based on this information, we can predict if a customer would buy the product or not. • See a small sample data set of 30 rows, with 15 of them, as shown below:
  • 69. Contd., Based on the dataset containing the three input types—day, discount, and free delivery— the frequency table for each attribute is populated.
  • 79. 2. K-Nearest Neighbors The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. A supervised machine learning algorithm (as opposed to an unsupervised machine learning algorithm) is one that relies on labeled input data to learn a function that produces an appropriate output when given new unlabeled data. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. •K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems. •K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.
  • 80. Contd., Why do we need a K-NN Algorithm? Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. K-Nearest Neighbor is a classification and prediction algorithm that is used to divide data into classes based on the distance between the data points. K-Nearest Neighbor assumes that data points which are close to one another must be similar and hence, the data point to be classified will be grouped with the closest cluster.
  • 83. Contd., 57 KG, 170 CM IS NORMAL
  • 84. 3. Logistic Regression Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable(result in binary format) that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). •It is a technique to analyse a data-set which has a dependent variable and one or more independent variables to predict the outcome in a binary variable, meaning it will have only two outcomes. •The dependent variable is categorical in nature. Dependent variable is also referred as target variable and the independent variables are called the predictors Logistic regression is a supervised learning algorithm used to predict a dependent categorical target variable but is used to classify samples; Therefore, it falls under the classification algorithm. .
  • 85. Contd., Linear Regression vs Logistic Regression • Both are supervised learning models and make use of labeled data for making predictions. • Linear regression is used for regression(prediction)problems whereas Logistic regression can be used in both classification and regression problems but is widely used as a classification algorithm • But the main difference between them is how they are being used. The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the Classification problems. The description of both the algorithms is given below along with difference table.
  • 86. Contd., Type of Logistic Regression: On the basis of the Dependent variable, Logistic Regression can be classified into three types: • Binomial: There can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, Purchased or Not Purchased, Tall or Short, Fat or Slim, Rock or Mine, etc. • Multinomial: There can be 3 or more possible unordered types of the dependent variable, such as apple, banana, orange or cat, dog, goat, sheep or Delhi, Mumbai, Bangalore, Calcutta. • Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as High, medium, low, or ratings of a restaurant from 1 to 5 or the intensity of the light, or a 5 points Likert scale, etc
  • 87. Contd., Linear Regression Logistic Regression Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems. In Linear regression, we predict the value of continuous variables. In logistic Regression, we predict the values of categorical variables. In linear regression, we find the best fit line, by which we can easily predict the output. In Logistic Regression, we find the S-curve by which we can classify the samples. Least square estimation method is used for estimation of accuracy. Maximum likelihood estimation method is used for estimation of accuracy. The output for Linear Regression must be a continuous value, such as price, age, etc. The output of Logistic Regression must be a Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that relationship between dependent variable and independent variable must be linear. In Logistic regression, it is not required to have the linear relationship between the dependent and independent variable. In linear regression, there may be collinearity between the independent variables. In logistic regression, there should not be collinearity between the independent variable.
  • 88. 4. Support Vector Machines (SVM) Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning. The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. •They build upon basic ML algorithms and add features that make them more efficient at various tasks. It can be used in a variety of tasks, including anomaly detection, handwriting recognition, and text classification. Because of their flexibility, high performance, and compute efficiency,
  • 91. Contd., Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat.
  • 92. Contd., SVM can be of two types: • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. Linear Separable Data Non-Linear Separable Data
  • 93. Contd., • When we can easily separate data with hyperplane by drawing a straight line is Linear SVM. When we cannot separate data with a straight line we use Non – Linear SVM. In this, we have Kernel functions. They transform non-linear spaces into linear spaces. It transforms data into another dimension so that the data can be classified. • It transforms two variables x and y into three variables along with z. Therefore, the data have plotted from 2-D space to 3-D space. Now we can easily classify the data by drawing the best hyperplane between them. Linear SVM Non-Linear SVM It can be easily separated with a linear line. It cannot be easily separated with a linear line. Data is classified with the help of hyperplane. We use Kernels to make non-separable data into separable data. Data can be easily classified by drawing a straight line. We map data into high dimensional space to classify. Linear SVM vs Non-Linear SVM
  • 95. Contd., The working of the SVM algorithm (LINEAR SVM) 1.Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue 2.So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. 3.Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. 4.SVM algorithm finds the closest point of the lines from both the classes. These points are called support vectors. 5.The distance between the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.
  • 96. Contd., Non-Linear SVM: 1.If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. 2. So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as: z=x2 +y2, By adding the third dimension, the sample space will become as
  • 97. Contd., 3. So now, SVM will divide the datasets into classes in the following way. 4. Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z=1, then it will become as: we get a circumference of radius 1 in case of non-linear data.
  • 98. Contd., Advantages of SVM • Good for smaller cleaner datasets. • Accurate results. • Useful for both linearly separable data and non – linearly separable data. • Effective in high dimensional spaces. Disadvantages of SVM • Not suitable for large datasets, as the training time can be too much. • Not so effective on a dataset with overlapping classes. • Picking the right kernel can be computationally intensive. Applications of SVM • Sentiment analysis. • Spam Detection. • Handwritten digit recognition. • Image recognition challenges
  • 99. 5. Decision Trees • Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. • In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. • The decisions or the test are performed on the basis of features of the given dataset. • It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. • It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure. • In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. • A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.
  • 100. Contd., A decision tree can contain categorical data (YES/NO) as well as numeric data. Decision Tree Terminologies • Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. • Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. • Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. • Branch/Sub Tree: A tree formed by splitting the tree. • Pruning: Pruning is the process of removing the unwanted branches from the tree. • Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
  • 101. Contd., Attribute Selection Measures While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two popular techniques for ASM, • Information Gain • Gini Index 1. Information Gain: • Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute. • It calculates how much information a feature provides us about a class. • According to the value of information gain, we split the node and build the decision tree. • A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula: Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
  • 102. Contd., Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as: Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) – S= Total number of samples – P(yes)= probability of yes – P(no)= probability of no 2. Gini Index: •Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. •An attribute with the low Gini index should be preferred as compared to the high Gini index. •It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. •Gini index can be calculated using the below formula: Gini Index= 1- ∑j Pj 2
  • 123. Contd., Advantages of Decision Tree 1. Clear Visualization 2. Simple and easy to understand 3. Decision Tree can be used for both classification and regression problems. 4. Decision Tree can handle both continuous and categorical variables. 5. No feature scaling required 6. Handles non-linear parameters efficiently 7. Decision Tree can automatically handle missing values. 8. Decision Tree is usually robust to outliers and can handle them automatically. 9. Less Training Period Disadvantages of Decision Tree 1. Overfitting 2. High variance 3. Unstable 4. Affected by noise 5. Not suitable for large datasets
  • 124. 6. Random Forest • Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. • "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. • The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting. Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele Cutler, which combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems.
  • 126. Contd., • Random forest algorithms have three main hyperparameters, which need to be set before training. • These include node size, the number of trees, and the number of features sampled. From there, the random forest classifier can be used to solve for regression or classification problems. • The random forest algorithm is made up of a collection of decision trees, and each tree in the ensemble is comprised of a data sample drawn from a training set with replacement, called the bootstrap sample. Of that training sample, one-third of it is set aside as test data, known as the out-of-bag (oob) sample, which we’ll come back to later. Another instance of randomness is then injected through feature bagging, adding more diversity to the dataset and reducing the correlation among decision trees. Depending on the type of problem, the determination of the prediction will vary. For a regression task, the individual decision trees will be averaged, and for a classification task, a majority vote—i.e. the most frequent categorical variable—will yield the predicted class. Finally, the oob sample is then used for cross-validation, finalizing that prediction.
  • 127. Contd., a) Working of Random Forest Algorithm Before understanding the working of the random forest we must look into the ensemble technique. Ensemble Learning 1.means Groups 2.In Ensemble learning individual models come together and bring forth a model that is more accurate ) simply means combining multiple models. Thus a collection of models is used to make predictions rather than an individual model. Why use Ensemble Models: 1.Better accuracy (low error) 2.Higher consistency (avoid overfitting) •Reduce bias and variance errors
  • 128. Contd., Ensemble uses two types of methods: 1. Bagging– It creates a different training subset from sample training data with replacement & the final output is based on majority voting. For example, Random Forest. 2. Boosting– It combines weak learners into strong learners by creating sequential models such that the final model has the highest accuracy. For example, ADA BOOST, XG BOOST Random Forest Algorithm working on Bagging principle
  • 129. Contd., 1. Bagging Bagging, also known as Bootstrap Aggregation is the ensemble technique used by random forest. Bagging chooses a random sample from the data set. Hence each model is generated from the samples (Bootstrap Samples) provided by the Original Data with replacement known as row sampling. This step of row sampling with replacement is called bootstrap. Now each model is trained independently which generates results. The final output is based on majority voting after combining the results of all models. This step which involves combining all the results and generating output based on majority voting is known as aggregation.
  • 130. Contd., Bagging – various models are built in parallel on various samples and then the Various models vote to give the final model and hence prediction
  • 131. Contd., 2. Boosting – (it is a process that uses a set of machine learning algorithms to combine weak learner to form strong learners in order to increase the accuracy of the model) Boosting is an ensemble modeling technique that attempts to build a 1. strong classifier from the number of weak classifiers. It is done by building a model by using weak models in series. Firstly, a model is built from the training data. 2. Then the second model is built which tries to correct the errors present in the first model. This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models are added. 1. Little variation on bagging 2. Selecting points which give wrong predictions
  • 132. Contd., How Does Boosting Algorithm Work The basic principle behind the working of the boosting algorithm is to generate multiple weak learners and combine their predictions to form one strong rule Decision stumps – is nothing but , single level decision tree, that tries to classify the data points, and then equal weightage given to all data points
  • 133. Contd., Key Benefits •Reduced risk of overfitting • Provides flexibility •Easy to determine feature importance Key Challenges •Time-consuming process •Requires more resources •More complex Advantages and Disadvantages of Random Forest •It reduces overfitting in decision trees and helps to improve the accuracy •It is flexible to both classification and regression problems •It works well with both categorical and continuous values •It automates missing values present in the data •Normalising of data is not required as it uses a rule-based approach. However, despite these advantages, a random forest algorithm also has some drawbacks. •It requires much computational power as well as resources as it builds numerous trees to combine their outputs. •It also requires much time for training as it combines a lot of decision trees to determine the class. •Due to the ensemble of decision trees, it also suffers interpretability and fails to determine the significance of each variable.
  • 134. Contd., S.N O Bagging Boosting 1. The simplest way of combining predictions that belong to the same type. A way of combining predictions that belong to the different types. 2. Aim to decrease variance, not bias. Aim to decrease bias, not variance. 3. Each model receives equal weight. Models are weighted according to their performance. 4. Each model is built independently. New models are influenced by the performance of previously built models. 5. Different training data subsets are randomly drawn with replacement from the entire training dataset. Every new subset contains the elements that were misclassified by previous models. 6. Bagging tries to solve the over-fitting problem. Boosting tries to reduce bias. 7. If the classifier is unstable (high variance), then apply bagging. If the classifier is stable and simple (high bias) the apply boosting. 8. Example: The Random forest model uses Bagging. Example: The Ada Boost uses Boosting techniques
  • 135. Contd., Decision trees Random Forest 1. Decision trees normally suffer from the problem of overfitting if it’s allowed to grow without any control. 1. Random forests are created from subsets of data and the final output is based on average or majority ranking and hence the problem of overfitting is taken care of. 2. A single decision tree is faster in computation. 2. It is comparatively slower. 3. When a data set with features is taken as input by a decision tree it will formulate some set of rules to do prediction. 3. Random forest randomly selects observations, builds a decision tree and the average result is taken. It doesn’t use any set of formulas.