Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science Training | Edureka(20)

Anúncio
Anúncio

Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science Training | Edureka

  1. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest
  2. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Will You Learn Today? Why Random Forest?Introduction What is Random Forest? Random Forest - Example How Random Forest Works? Demo In R: Diabetes Prevention Use Case 1 2 3 4 65
  3. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Introduction
  4. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Introduction To Classification  Classification is the problem of identifying to which set of categories a new observation belongs.  It is a supervised learning model as the classifier already has a set of classified examples and from these examples, the classifier learns to assign unseen new examples.  Example: Assigning a given email into "spam" or "non-spam" category. Is this A or B ?
  5. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types Of Classifiers Decision Tree • Decision tree builds classification models in the form of a tree structure. • It breaks down a dataset into smaller and smaller subsets. • Random Forest is an ensemble classifier made using many decision tree models. • Ensemble models combine the results from different models. Random Forest Naïve Bayes • It is a classification technique based on Bayes' Theorem with an assumption of independence among attributes.
  6. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Random Forest?
  7. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Use Case - Credit Risk Detection  To minimize loss, the bank needs a decision rule to predict whom to give approval of the loan.  An applicant’s demographic (income, debts, credit history) and socio-economic profiles are considered.  Data science can help banks recognize behavior patterns and provide a complete view of individual customers.
  8. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Use Case - Credit Risk Detection student Risk Credit history Bank Balance age Risk No Risk No RiskRisk Final outcome
  9. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What is Random Forest?
  10. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Is Random Forest?  Random Forest - a versatile algorithm capable of performing both i) Regression ii) Classification  It is a type of ensemble learning method  Commonly used predictive modelling and machine learning technique
  11. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest - Example Let’ say you want to decide if to watch “Edge of Tomorrow” or not. So you will decide based on following two actions. (i) You can ask your best friend (ii) You can ask bunch of friends.
  12. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest - Example To figure out if you will like “Edge of Tomorrow” or not, your friend will analyze a few things as: (i) If you like Adventure and Action (ii) If you like Emily Blunt Thus, a decision tree is created by your best friend. Ask best friend Genre - Adventure Yes Cast - Emily Blunt No Is Emily Blunt main lead? Like Don’t Like Yes No Like Don’t Like
  13. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest - Example In order to get more accurate recommendations, you will have to ask bunch of friends, say #Friend1, #Friend2, #Friend3 and consider their vote. Each one of them may take movies of different genre and further decide. The majority of the votes will decide the final outcome. Thus you build random forest of group of friends.
  14. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest - Example Friend 1 Top Gun Action movies Yes No Like Don’t Like Yes Like No Godzilla Don’t Like Friend 3 Far and Away Yes Oblivion Like No Like Friend 2 Tom Cruise
  15. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest Use Cases Banking Remote sensing Medicine Banking Identification of loan risk applicants by their probability of defaulting payments. Medicine Identification of at-risk patients and disease trends. Land Use Identification of areas of similar land use. Marketing Identifying customer churn. Use-cases Marketing
  16. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Random Forest Works?
  17. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Random Forest Algorithm i.Randomly select m features from T; where 𝑚≪T i.For node d, calculate the best split point among the 𝑚 feature i.Split the node into two daughter nodes using the best split Repeat first three steps until 𝑛 number of nodes has been reached Build your forest by repeating steps i–iv for 𝐷 number of times  T: number of features  𝐷: number of trees to be constructed  𝑉: Output: the class with the highest vote
  18. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Random Forest Works? Let’s take an example, We have taken dataset consisting of: • Weather information of last 14 days • Whether match was played or not on that particular day Now using the random forest we need to predict whether the game will happen if the weather condition is Outlook = Rain Humidity = High Wind = Weak Play = ?
  19. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Random Forest Works?  The first step in Random forest is that it will divide the data into smaller subsets.  Every subsets need not be distinct, some subsets maybe overlapped
  20. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Random Forest Works? D1,D2,D3 Overcast Wind Play No Play Play D7,D8,D9 Overcast Play No play Play Humidity D3,D4,D5,D6 Wind Overcast Play Wind Humidity PlayPlay No play No play Play Play
  21. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Features of Random Forest Most accurate learning algorithms Works well for both classification and regression problems Runs efficiently on large databases Requires almost no input preparation Performs implicit feature selection Can be easily grown in parallel Methods for balancing error in unbalanced data sets
  22. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo
  23. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? Sure! Let me take you through the steps to predict the vulnerable patients.
  24. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation Doctor gets the following data from the medical history of the patient.
  25. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation We will divide our entire dataset into two subsets as: • Training dataset -> to train the model • Testing dataset -> to validate and make predictions
  26. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation Before we create random forest, let’s find out the best mtry value using following commands
  27. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation  Here, we implement random forest in R using following commands.
  28. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation  We get the output as follows
  29. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation Let’s see what all variables are most important for our model. For plotting the we can use the following commands As per MiniDecreaseGini value, glucose_conc is the most important variable in the model.
  30. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation Now, we can use our model to predict the output of our testing dataset. We can use the following code for predicting the output.  pred1_diabet<-predict(diabet_forest,newdata = diabet_test,type ="class")  pred1_diabet
  31. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation We get the following output for our testing dataset where: “YES” means the probability of patient being vulnerable to diabetes is positive “NO” means the probability of patient being vulnerable to diabetes is negative.
  32. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Model Validation  library(caret)  confusionMatrix(table(pred1_diabet,diabet_test$is_diabetic)) We can create confusion matrix for the model using the library caret to know how good is our model.
  33. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Demo Data Acquisition Divide dataset Implement model Visualize Accuracy = 79.66% The accuracy (or the overall success rate) is a metric defining the rate at which a model has classified the records correctly. A good model should have a high accuracy score Divide dataset Implement model Visualize Model Validation
  34. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Course Details Go to www.edureka.co/data-science Get Edureka Certified in Data Science Today! What our learners have to say about us! Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support and other infrastructures are a top notch.”
  35. www.edureka.co/data-scienceEdureka’s Data Science Certification Training

Notas do Editor

  1. Add photos
Anúncio