Denunciar

Compartilhar

•0 gostou•57 visualizações

•0 gostou•57 visualizações

Denunciar

Compartilhar

Baixar para ler offline

Overview of Machine learning concepts – Over fitting and train/test splits, Types of Machine learning – Supervised, Unsupervised, Reinforced learning, Introduction to Bayes Theorem, Linear Regression- model assumptions, regularization (lasso, ridge, elastic net), Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic regression, support vector machines (SVM), decision trees, and random forest, Classification Errors..

- 1. MACHINE LEARNING AND DATA SCIENCES III-B.Tech.-II-Sem Subject Code: CS-PCC-322 Unit-II: Machine Learning 10 hours Overview of Machine learning concepts – Over fitting and train/test splits, Types of Machine learning – Supervised, Unsupervised, Reinforced learning, Introduction to Bayes Theorem, Linear Regression- model assumptions, regularization (lasso, ridge, elastic net), Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic regression, support vector machines (SVM), decision trees, and random forest, Classification Errors.. Dr.S.Dhanalakshmi
- 2. Introduction to Machine Learning (Definition) Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning generally is to understand the structure of data and fit that data into models that can be understood and utilized by people. •Machine Learning is the most popular technique of predicting the future or classifying information to help people in making necessary decisions. •Machine Learning algorithms are trained over instances or examples through which they learn from past experiences and also analyze the historical data. •The whole concept of machine learning is figuring out ways in which we can teach a computer to perform a task without a need to provide explicit instructions. •The term machine learning was coined in 1959 by Arthur SamuelThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMerThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gamingThe term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gaming and artificial intelligence. •Machine learning, which deals with the information world. Machines use data to learn, and machine learning aims to derive meaning from that data. Machine learning uses statistical methods to enable machines to improve with machines. A subset of machine learning is deep learning, which enables multi-layer neural networks.
- 3. Contd.,
- 4. Contd.,
- 5. Overview of Machine Learning Concepts AI is the greater pool that contains an amalgamation of all (AI enables machines to think without any human intervention) Machine Learning is a part of Artificial Intelligence that involves implementing algorithms that are able to learn from the data or previous instances and are able to perform tasks without explicit instructions. (subset of AI that uses statistical learning algorithms that learn pattern in data over time) Deep learning is a component of a broader family of machine learning methods supported artificial neural networks with representation learning. (subset of ML that filters the data through multiple layers)
- 6. Essential Components of Machine Learning • Representation (what the model looks like) • Evaluation (how do we differentiate good models from bad ones) • Optimization ( what is our process for finding the good models among all the possible models)
- 7. Contd.,
- 8. Contd.,
- 9. TRADITIONAL PROGRAMMING VS MACHINE LEARNING Traditional Programming Traditional programming is a manual process — meaning a person (programmer) creates the program. But without anyone programming the logic, one has to manually formulate or code rules. We have the input data, and someone (programmer) coded a program that uses that data and runs on a computer to produce the desired output. Machine Learning Machine Learning, on the other hand, the input data and output are fed to an algorithm to create a program. In Traditional programming one has to manually formulate/code rules while in Machine Learning the algorithms automatically formulate the rules from the data, which is very powerful. .
- 10. Contd.,
- 11. Terminology in Machine Learning Model: Also known as “hypothesis”, a machine learning model is the mathematical representation of a real-world process. A machine learning algorithm along with the training data builds a machine learning model. Feature: A feature is a measurable property or parameter of the data-set. Feature Vector: It is a set of multiple numeric features. We use it as an input to the machine learning model for training and prediction purposes. Training: An algorithm takes a set of data known as “training data” as input. The learning algorithm finds patterns in the input data and trains the model for expected results (target). The output of the training process is the machine learning model. Prediction: Once the machine learning model is ready, it can be fed with input data to provide a predicted output. Target (Label): The value that the machine learning model has to predict is called the target or label. Overfitting: When a massive amount of data trains a machine learning model, it tends to learn from the noise and inaccurate data entries. Here the model fails to characterize the data correctly. Underfitting: It is the scenario when the model fails to decipher the underlying trend in the input data. It destroys the accuracy of the machine learning model. In simple terms, the model or the algorithm does not fit the data well enough
- 12. How Does Machine Learning Works
- 13. Steps to Build for ML Model
- 14. Contd., 1. Data collection Machine learning requires training data, a lot of it (either labelled, meaning supervised learning or not labelled, meaning unsupervised learning). 2. Data preparation Raw data alone is not very useful. The data needs to be prepared, normalized, de-duplicated and errors and bias need to be removed. Visualisation of the data can be used to look for patterns and outliers to see if the right data has been collected or if data is missing. Cleaning and Visualizing Data
- 15. Contd., 3. Choosing a model Based on the collected data with relevant to the task choose a model. Its mainly used for various models, linear regression, logistic regression, decision trees, K-means, principal component analysis (PCA), Support Vector Machines (SVM), Naïve Bayes, Random Forest and Neural Networks. If your model is suited for numerical or categorical data and choose accordingly. Model Applications Logistic Regression Price prediction Fully connected networks Classification Convolutional Neural Networks Image processing Recurrent Neural Networks Voice recognition Random Forest Fraud Detection Reinforcement Learning Learning by trial and error Generative Models Image creation K-means Segmentation k-Nearest Neighbors Recommendation systems Bayesian Classifiers Spam and noise filtering
- 16. Contd., 4. Training Training is the most important step in machine learning. In training, you pass the prepared data to your machine learning model to find patterns and make predictions. It results in the model learning from the data so that it can accomplish the task set. Over time, with training, the model gets better at predicting. 5. Evaluation Evaluating a model After training the model comes evaluating the model. This entails testing the machine learning against an unused control dataset to see how it performs. This might be representative of how the model works in the real world, but this does not have to be the case. The larger the number of variables in the real world, the bigger to training and test data should be.
- 17. Contd., 6. Parameter tuning After evaluating your model, you should test the originally set parameters to improve the AI. Increasing the number of training cycles can lead to more accurate results. However, you should define when a model is good enough as otherwise, you will continue to tweak the model. This is an experimental process. 7. Prediction Once you have gone through the process of collecting data, preparing the data, selecting the model, training and evaluating the model and tuning the parameters, it is time to answer questions using predictions. These can be all kinds of predictions, ranging from image recognition to semantics to predictive analytics.
- 19. Applications of Machine Learning
- 20. Typical Machine Learning Process • Training data. This type of data builds up the machine learning algorithm. The data scientist feeds the algorithm input data, which corresponds to an expected output. The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose. • Validation data. During training, validation data infuses new data into the model that it hasn’t evaluated before. Validation data provides the first test against unseen data, allowing data scientists to evaluate how well the model makes predictions based on the new data. Not all data scientists use validation data, but it can provide some helpful information to optimize hyper parameters, which influence how the model assesses data. • Test data. After the model is built, testing data once again validates that it can make accurate predictions. If training and validation data include labels to monitor performance metrics of the model, the testing data should be unlabeled. Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively.
- 21. Typical Machine Learning Process overfitting Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.
- 22. Over fitting and train/test splits
- 23. Over fitting and train/test splits Train/Test is a method to measure the accuracy of your model. • It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. • You train the model using the training set, You test the model using the testing set. • Train the model means create the model, Test the model means test the accuracy of the model. • Nevertheless, common split percentages include: Train: 80%, Test: 20% Train: 67%, Test: 33% Train: 50%, Test: 50%
- 24. Splitting DataSets • To use dataset in machine learning the dataset is first split into a training and test set. • The training set is used to train the model • The test set is used to test the accuracy of the model
- 26. •Second Method
- 27. Data Imbalance _Overfitting • If the training data is overly unbalanced, then the model will predict a non meaningful result • For example, if a model is a binary classifier (eg. Cat vs Dog) and nearly all the samples are of the same label (cat) then the model will simply learn that everything is a that label (Cat). • This is called Overfitting. To prevent overfitting, there needs to be fairly equal distribution of training samples for each classification, or range if label is a real value
- 31. Types of Machine Learning Machine Learning Techniques • Supervised Machine Learning • Semi-supervised Machine Learning • Unsupervised Machine Learning • Reinforcement Machine Learning
- 32. Machine Learning Techniques Machine learning, tasks are generally classified into broad categories. These categories are based on how learning is received or how feedback on the learning is given to the system developed. •Two of the most widely adopted machine learning methods are supervised learning which trains algorithms based on example input and output data that is labeled by humans, and unsupervised learning which provides the algorithm with no labeled data in order to allow it to find structure within its input data. The semi-supervised models use both labeled and unlabeled data for training. reinforcement learning has a feedback type of algorithm (the machine learn on its own)
- 33. Contd.,
- 34. Contd.,
- 35. Supervised Machine Learning 1. It is a type of learning in which both input and desired output data are provided. 2. Input and output data are labeled for classification to provide a learning basis for future data processing. (A model based on supervised learning would require both previous data and the previous results as input. By training with this data, the model helps in predicting results that are more accurate. 3. This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). 4. Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data 5. Supervised learning has methods like classification, regression, naïve bayes theorem, SVM, KNN, decision tree, etc. .
- 36. Contd.,
- 37. Contd.,
- 38. Contd., Supervised learning is classified into two categories of algorithms: Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”. Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”. Types:- Regression Logistic Regression Classification Naive Bayes Classifiers K-NN (k nearest neighbors) Decision Trees Support Vector Machine Advantages:- •Supervised learning allows collecting data and produces data output from previous experiences. •Helps to optimize performance criteria with the help of experience. •Supervised machine learning helps to solve various types of real-world computation problems. Disadvantages:- •Classifying big data can be challenging. •Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
- 39. Contd.,
- 40. Unsupervised Machine Learning 1. Unsupervised learning needs no previous data as input. It is the method that allows the model to learn on its own using the data, which you give. Here, the data is not labelled, but the algorithm helps the model in forming clusters of similar types of data. For example, if we have the data of dogs and cats, the model will process and train itself with the data. Since it has no previous experience of the data, it will form clusters based on similarities of features. 2. it trains the model by making it learn about the data and work on it from the very start. Also, after the data is clustered and classified, we can easily label the data in separate categories as the data is already solved now.
- 43. Contd., Unsupervised learning is classified into two categories of algorithms: 1. Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. 2. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y. Types of Unsupervised Learning:- • Clustering • Exclusive (partitioning) • Agglomerative • Overlapping • Probabilistic Clustering Types:- 1. Hierarchical clustering • K-means clustering • Principal Component Analysis • Singular Value Decomposition • Independent Component Analysis
- 44. Contd., Advantages of Unsupervised Learning 1.we sometimes choose unsupervised learning in place of supervised learning. Here are some of the advantages: 2.Labeling of data demands a lot of manual work and expenses. Unsupervised learning solves the problem by learning the data and classifying it without any labels. 3.The labels can be added after the data has been classified which is much easier. 4. It is very helpful in finding patterns in data, which are not possible to find using normal methods. 5.Dimensionality reduction can be easily accomplished using unsupervised learning. 6.unsupervised learning can help to understand raw data. Disadvantages of Unsupervised Learning •The result might be less accurate as we do not have any input data to train from. 1.The model is learning from raw data without any prior knowledge. 2.It is also a time-consuming process. The learning phase of the algorithm might take a lot of time, as it analyses and calculates all possibilities.
- 45. Contd.,
- 46. Contd.,
- 47. Semi-supervised Machine Learning This is a combination of supervised and unsupervised learning. This method helps to reduce the shortcomings of both the above learning methods. In supervised learning, labelling of data is manual work and is very costly as data is huge. In unsupervised learning, the areas of application are very limited. To reduce these problems, semi-supervised learning is used. the model first trains under unsupervised learning. This ensures that most of the unlabeled data divide into clusters. For the remaining unlabeled data, the generation of labels takes place and classification carries with ease. This technique is very useful in areas like speech recognition and analysis, protein classification, text classification, etc. This is a type of hybrid learning problem. (its working lies between Supervised and Unsupervised techniques. We use these techniques when we are dealing with data that is a little bit labeled and the rest large portion of it is unlabeled. We can use the unsupervised techniques to predict labels and then feed these labels to supervised techniques. This technique is mostly applicable in the case of image data sets where usually all images are not labeled.)
- 48. Reinforcement Machine Learning 1. The model keeps on increasing its performance using Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and even itself to getting better and better performers of Go Game. 2. Each time we feed in data, they learn and add the data to its knowledge that is training data. So, the more it learns the better it gets trained and hence experienced.
- 50. Reinforcement Machine Learning 1. Reinforcement Learning is a type of learning methodology in ML along with supervised and unsupervised learning. But, when we compare these three, reinforcement learning is a bit different than the other two. Here, we take the concept of giving rewards for every positive result and make that the base of our algorithm. 2. We can train our dog to perform certain actions, of course, it won’t be an easy task. You would order the dog to do certain actions and for every proper execution, you would give a biscuit as a reward. The dog will remember that if it does a certain action, it would get biscuits. This way it will follow the instructions properly next time.
- 52. Types of Data
- 54. Applications
- 55. Contd.,
- 56. What are the most common and popular machine learning algorithms? 1. Naïve Bayes Classifier Algorithm (Supervised Learning - Classification) 2. K Means Clustering Algorithm (Unsupervised Learning - Clustering) 3. Support Vector Machine Algorithm (Supervised Learning - Classification) 4. Linear Regression (Supervised Learning/Regression) 5. Logistic Regression (Supervised learning – Classification) 6. Artificial Neural Networks (Reinforcement Learning) 7. Decision Trees (Supervised Learning – Classification/Regression) 8. Random Forests (Supervised Learning – Classification/Regression) 9. Nearest Neighbours (Supervised Learning)
- 57. Classfication and Regression Algorithms • Definition of Classification • Definition of Regression • Differentiate between Classification and Regression • Types of Classification Algorithms – Naïve Bayes Algorithms – K-Nearest Neighbors Algorithms – Logistic Regression – support vector machines (SVM) – Decision Trees – Random Forest Classification Errors
- 58. Contd., Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. The main difference between Regression and Classification algorithms that Regression algorithms are used to predict the continuous values such as price, salary, age, etc. and Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, etc.
- 59. Definition of Classification Classification: Classification is a process of finding a function which helps in dividing the dataset into classes based on different parameters. In Classification, a computer program is trained on the training dataset and based on that training, it categorizes the data into different classes. The task of the classification algorithm is to find the mapping function to map the input(x) to the discrete output(y). Example: The best example to understand the Classification problem is Email Spam Detection. The model is trained on the basis of millions of emails on different parameters, and whenever it receives a new email, it identifies whether the email is spam or not. If the email is spam, then it is moved to the Spam folder.
- 60. Contd., Types of ML Classification Algorithms: Classification Algorithms can be further divided into the following types: – Logistic Regression – K-Nearest Neighbours – Support Vector Machines – Naïve Bayes – Decision Tree Classification – Random Forest Classification
- 61. Definition of Regression Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc. The task of the Regression algorithm is to find the mapping function to map the input variable(x) to the continuous output variable(y). Example: Suppose we want to do weather forecasting, so for this, we will use the Regression algorithm. In weather prediction, the model is trained on the past data, and once the training is completed, it can easily predict the weather for future days.
- 62. Contd., Types of Regression Algorithm • Simple Linear Regression • Multiple Linear Regression • Polynomial Regression • Support Vector Regression • Decision Tree Regression • Random Forest Regression
- 63. Regression Algorithm Classification Algorithm In Regression, the output variable must be of continuous nature or real value. In Classification, the output variable must be a discrete value. The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). Regression Algorithms are used with continuous data. Classification Algorithms are used with discrete data. In Regression, we try to find the best fit line, which can predict the output more accurately. In Classification, we try to find the decision boundary, which can divide the dataset into different classes. Regression algorithms can be used to solve the regression problems such as Weather Prediction, House price prediction, etc. Classification Algorithms can be used to solve classification problems such as Identification of spam emails, Speech Recognition, Identification of cancer cells, etc. The regression Algorithm can be further divided into Linear and Non-linear Regression. The Classification algorithms can be divided into Binary Classifier and Multi-class Classifier.
- 64. 1. Naïve Bayes Algorithm • The Naive Bayes classifier works on the principle of conditional probability, as given by the Bayes theorem. While calculating the math on probability, we usually denote probability as P. • The Bayes theorem gives us the conditional probability of event A, given that event B has occurred. In this case, the first coin toss will be B and the second coin toss A. This could be confusing because we've reversed the order of them and go from B to A instead of A to B. • Bayes theorem calculates the conditional probability of the occurrence of an event based on prior knowledge of conditions that might be related to the event.
- 65. Contd., • Naïve Bayes Classifier is one among the straightforward and best Classification algorithms which helps in building the fast machine learning models which will make quick predictions. • Naive Bayes is one of the powerful machine learning algorithms that is used for classification. • It is an extension of the Bayes theorem wherein each feature assumes independence. It is used for a variety of tasks such as spam filtering and other areas of text classification. Understanding Naive Bayes and Machine Learning Machine learning falls into two categories: • Supervised learning and Unsupervised learning Supervised learning falls into two categories: • Classification and Regression Naive Bayes algorithm falls under classification.
- 66. Contd., Naïve Bayes used for, • Face Recognition - As a classifier, it is used to identify the faces or its other features, like nose, mouth, eyes, etc. • Weather Prediction -It can be used to predict if the weather will be good or bad. • Medical Diagnosis - Doctors can diagnose patients by using the information that the classifier provides. Healthcare professionals can use Naive Bayes to indicate if a patient is at high risk for certain diseases and conditions, such as heart disease, cancer, and other ailments. • News Classification But it will require less training data
- 67. Contd., Example Under the day, look for variables, like weekday, weekend, and holiday. For any given day, check if there are a discount and free delivery. Based on this information, we can predict if a customer would buy the product or not. • See a small sample data set of 30 rows, with 15 of them, as shown below:
- 69. Contd., Based on the dataset containing the three input types—day, discount, and free delivery— the frequency table for each attribute is populated.
- 70. Contd.,
- 71. Contd.,
- 73. Contd.,
- 74. Contd.,
- 75. Contd.,
- 76. Contd.,
- 77. Contd.,
- 78. Contd.,
- 79. 2. K-Nearest Neighbors The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. A supervised machine learning algorithm (as opposed to an unsupervised machine learning algorithm) is one that relies on labeled input data to learn a function that produces an appropriate output when given new unlabeled data. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. •K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems. •K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.
- 80. Contd., Why do we need a K-NN Algorithm? Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. K-Nearest Neighbor is a classification and prediction algorithm that is used to divide data into classes based on the distance between the data points. K-Nearest Neighbor assumes that data points which are close to one another must be similar and hence, the data point to be classified will be grouped with the closest cluster.
- 81. Contd.,
- 82. Contd.,
- 83. Contd., 57 KG, 170 CM IS NORMAL
- 84. 3. Logistic Regression Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable(result in binary format) that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). •It is a technique to analyse a data-set which has a dependent variable and one or more independent variables to predict the outcome in a binary variable, meaning it will have only two outcomes. •The dependent variable is categorical in nature. Dependent variable is also referred as target variable and the independent variables are called the predictors Logistic regression is a supervised learning algorithm used to predict a dependent categorical target variable but is used to classify samples; Therefore, it falls under the classification algorithm. .
- 85. Contd., Linear Regression vs Logistic Regression • Both are supervised learning models and make use of labeled data for making predictions. • Linear regression is used for regression(prediction)problems whereas Logistic regression can be used in both classification and regression problems but is widely used as a classification algorithm • But the main difference between them is how they are being used. The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the Classification problems. The description of both the algorithms is given below along with difference table.
- 86. Contd., Type of Logistic Regression: On the basis of the Dependent variable, Logistic Regression can be classified into three types: • Binomial: There can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, Purchased or Not Purchased, Tall or Short, Fat or Slim, Rock or Mine, etc. • Multinomial: There can be 3 or more possible unordered types of the dependent variable, such as apple, banana, orange or cat, dog, goat, sheep or Delhi, Mumbai, Bangalore, Calcutta. • Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as High, medium, low, or ratings of a restaurant from 1 to 5 or the intensity of the light, or a 5 points Likert scale, etc
- 87. Contd., Linear Regression Logistic Regression Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems. In Linear regression, we predict the value of continuous variables. In logistic Regression, we predict the values of categorical variables. In linear regression, we find the best fit line, by which we can easily predict the output. In Logistic Regression, we find the S-curve by which we can classify the samples. Least square estimation method is used for estimation of accuracy. Maximum likelihood estimation method is used for estimation of accuracy. The output for Linear Regression must be a continuous value, such as price, age, etc. The output of Logistic Regression must be a Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that relationship between dependent variable and independent variable must be linear. In Logistic regression, it is not required to have the linear relationship between the dependent and independent variable. In linear regression, there may be collinearity between the independent variables. In logistic regression, there should not be collinearity between the independent variable.
- 88. 4. Support Vector Machines (SVM) Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning. The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. •They build upon basic ML algorithms and add features that make them more efficient at various tasks. It can be used in a variety of tasks, including anomaly detection, handwriting recognition, and text classification. Because of their flexibility, high performance, and compute efficiency,
- 89. Contd.,
- 90. Contd.,
- 91. Contd., Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat.
- 92. Contd., SVM can be of two types: • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. Linear Separable Data Non-Linear Separable Data
- 93. Contd., • When we can easily separate data with hyperplane by drawing a straight line is Linear SVM. When we cannot separate data with a straight line we use Non – Linear SVM. In this, we have Kernel functions. They transform non-linear spaces into linear spaces. It transforms data into another dimension so that the data can be classified. • It transforms two variables x and y into three variables along with z. Therefore, the data have plotted from 2-D space to 3-D space. Now we can easily classify the data by drawing the best hyperplane between them. Linear SVM Non-Linear SVM It can be easily separated with a linear line. It cannot be easily separated with a linear line. Data is classified with the help of hyperplane. We use Kernels to make non-separable data into separable data. Data can be easily classified by drawing a straight line. We map data into high dimensional space to classify. Linear SVM vs Non-Linear SVM
- 94. Contd.,
- 95. Contd., The working of the SVM algorithm (LINEAR SVM) 1.Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue 2.So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. 3.Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. 4.SVM algorithm finds the closest point of the lines from both the classes. These points are called support vectors. 5.The distance between the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.
- 96. Contd., Non-Linear SVM: 1.If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. 2. So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as: z=x2 +y2, By adding the third dimension, the sample space will become as
- 97. Contd., 3. So now, SVM will divide the datasets into classes in the following way. 4. Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z=1, then it will become as: we get a circumference of radius 1 in case of non-linear data.
- 98. Contd., Advantages of SVM • Good for smaller cleaner datasets. • Accurate results. • Useful for both linearly separable data and non – linearly separable data. • Effective in high dimensional spaces. Disadvantages of SVM • Not suitable for large datasets, as the training time can be too much. • Not so effective on a dataset with overlapping classes. • Picking the right kernel can be computationally intensive. Applications of SVM • Sentiment analysis. • Spam Detection. • Handwritten digit recognition. • Image recognition challenges
- 99. 5. Decision Trees • Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. • In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. • The decisions or the test are performed on the basis of features of the given dataset. • It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. • It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure. • In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. • A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.
- 100. Contd., A decision tree can contain categorical data (YES/NO) as well as numeric data. Decision Tree Terminologies • Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. • Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. • Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. • Branch/Sub Tree: A tree formed by splitting the tree. • Pruning: Pruning is the process of removing the unwanted branches from the tree. • Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.
- 101. Contd., Attribute Selection Measures While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two popular techniques for ASM, • Information Gain • Gini Index 1. Information Gain: • Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute. • It calculates how much information a feature provides us about a class. • According to the value of information gain, we split the node and build the decision tree. • A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula: Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
- 102. Contd., Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as: Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) – S= Total number of samples – P(yes)= probability of yes – P(no)= probability of no 2. Gini Index: •Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. •An attribute with the low Gini index should be preferred as compared to the high Gini index. •It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. •Gini index can be calculated using the below formula: Gini Index= 1- ∑j Pj 2
- 103. DataSet
- 123. Contd., Advantages of Decision Tree 1. Clear Visualization 2. Simple and easy to understand 3. Decision Tree can be used for both classification and regression problems. 4. Decision Tree can handle both continuous and categorical variables. 5. No feature scaling required 6. Handles non-linear parameters efficiently 7. Decision Tree can automatically handle missing values. 8. Decision Tree is usually robust to outliers and can handle them automatically. 9. Less Training Period Disadvantages of Decision Tree 1. Overfitting 2. High variance 3. Unstable 4. Affected by noise 5. Not suitable for large datasets
- 124. 6. Random Forest • Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. • "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output. • The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting. Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele Cutler, which combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems.
- 125. Contd.,
- 126. Contd., • Random forest algorithms have three main hyperparameters, which need to be set before training. • These include node size, the number of trees, and the number of features sampled. From there, the random forest classifier can be used to solve for regression or classification problems. • The random forest algorithm is made up of a collection of decision trees, and each tree in the ensemble is comprised of a data sample drawn from a training set with replacement, called the bootstrap sample. Of that training sample, one-third of it is set aside as test data, known as the out-of-bag (oob) sample, which we’ll come back to later. Another instance of randomness is then injected through feature bagging, adding more diversity to the dataset and reducing the correlation among decision trees. Depending on the type of problem, the determination of the prediction will vary. For a regression task, the individual decision trees will be averaged, and for a classification task, a majority vote—i.e. the most frequent categorical variable—will yield the predicted class. Finally, the oob sample is then used for cross-validation, finalizing that prediction.
- 127. Contd., a) Working of Random Forest Algorithm Before understanding the working of the random forest we must look into the ensemble technique. Ensemble Learning 1.means Groups 2.In Ensemble learning individual models come together and bring forth a model that is more accurate ) simply means combining multiple models. Thus a collection of models is used to make predictions rather than an individual model. Why use Ensemble Models: 1.Better accuracy (low error) 2.Higher consistency (avoid overfitting) •Reduce bias and variance errors
- 128. Contd., Ensemble uses two types of methods: 1. Bagging– It creates a different training subset from sample training data with replacement & the final output is based on majority voting. For example, Random Forest. 2. Boosting– It combines weak learners into strong learners by creating sequential models such that the final model has the highest accuracy. For example, ADA BOOST, XG BOOST Random Forest Algorithm working on Bagging principle
- 129. Contd., 1. Bagging Bagging, also known as Bootstrap Aggregation is the ensemble technique used by random forest. Bagging chooses a random sample from the data set. Hence each model is generated from the samples (Bootstrap Samples) provided by the Original Data with replacement known as row sampling. This step of row sampling with replacement is called bootstrap. Now each model is trained independently which generates results. The final output is based on majority voting after combining the results of all models. This step which involves combining all the results and generating output based on majority voting is known as aggregation.
- 130. Contd., Bagging – various models are built in parallel on various samples and then the Various models vote to give the final model and hence prediction
- 131. Contd., 2. Boosting – (it is a process that uses a set of machine learning algorithms to combine weak learner to form strong learners in order to increase the accuracy of the model) Boosting is an ensemble modeling technique that attempts to build a 1. strong classifier from the number of weak classifiers. It is done by building a model by using weak models in series. Firstly, a model is built from the training data. 2. Then the second model is built which tries to correct the errors present in the first model. This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models are added. 1. Little variation on bagging 2. Selecting points which give wrong predictions
- 132. Contd., How Does Boosting Algorithm Work The basic principle behind the working of the boosting algorithm is to generate multiple weak learners and combine their predictions to form one strong rule Decision stumps – is nothing but , single level decision tree, that tries to classify the data points, and then equal weightage given to all data points
- 133. Contd., Key Benefits •Reduced risk of overfitting • Provides flexibility •Easy to determine feature importance Key Challenges •Time-consuming process •Requires more resources •More complex Advantages and Disadvantages of Random Forest •It reduces overfitting in decision trees and helps to improve the accuracy •It is flexible to both classification and regression problems •It works well with both categorical and continuous values •It automates missing values present in the data •Normalising of data is not required as it uses a rule-based approach. However, despite these advantages, a random forest algorithm also has some drawbacks. •It requires much computational power as well as resources as it builds numerous trees to combine their outputs. •It also requires much time for training as it combines a lot of decision trees to determine the class. •Due to the ensemble of decision trees, it also suffers interpretability and fails to determine the significance of each variable.
- 134. Contd., S.N O Bagging Boosting 1. The simplest way of combining predictions that belong to the same type. A way of combining predictions that belong to the different types. 2. Aim to decrease variance, not bias. Aim to decrease bias, not variance. 3. Each model receives equal weight. Models are weighted according to their performance. 4. Each model is built independently. New models are influenced by the performance of previously built models. 5. Different training data subsets are randomly drawn with replacement from the entire training dataset. Every new subset contains the elements that were misclassified by previous models. 6. Bagging tries to solve the over-fitting problem. Boosting tries to reduce bias. 7. If the classifier is unstable (high variance), then apply bagging. If the classifier is stable and simple (high bias) the apply boosting. 8. Example: The Random forest model uses Bagging. Example: The Ada Boost uses Boosting techniques
- 135. Contd., Decision trees Random Forest 1. Decision trees normally suffer from the problem of overfitting if it’s allowed to grow without any control. 1. Random forests are created from subsets of data and the final output is based on average or majority ranking and hence the problem of overfitting is taken care of. 2. A single decision tree is faster in computation. 2. It is comparatively slower. 3. When a data set with features is taken as input by a decision tree it will formulate some set of rules to do prediction. 3. Random forest randomly selects observations, builds a decision tree and the average result is taken. It doesn’t use any set of formulas.