SlideShare uma empresa Scribd logo
1 de 66
An Introduction to RandomForests™
Salford Systems
http://www.salford-systems.com
golomi@salford-systems.com
Dan Steinberg, Mikhail Golovnya, N. Scott Cardell
 New approach for many data analytical tasks developed by
Leo Breiman of University of California, Berkeley
◦ Co-author of CART® with Friedman, Olshen, and Stone
◦ Author of Bagging and Arcing approaches to combining trees
 Good for classification and regression problems
◦ Also for clustering, density estimation
◦ Outlier and anomaly detection
◦ Explicit missing value imputation
 Builds on the notions of committees of experts but is
substantially different in key implementation details
 The term usually refers to pattern discovery in large data bases
 Initially appeared in the late twentieth century and directly
associated with the PC boom
◦ Spread of data collection devices
◦ Dramatically increased data storage capacity
◦ Exponential growth in computational power of CPUs
 The necessity to go way beyond standard statistical techniques
in data analysis
◦ Dealing with extremely large numbers of variables
◦ Dealing with highly non-linear dependency structures
◦ Dealing with missing values and dirty data
 The following major classes of problems are
usually considered:
◦ Supervised Learning (interested in predicting some
outcome variable based on observed predictors)
 Regression (quantitative outcome)
 Classification (nominal or categorical outcome)
◦ Unsupervised Learning (no single target variable
available- interested in partitioning data into cluster,
finding association rules, etc.)
 Relating gene expressions to the presence of a
certain decease based upon microarray data
 Indentifying potential fraud cases in credit card
transactions (binary target)
 Predicting level of user satisfaction as poor, average,
good, excellent (4-level target)
 Optical Digit Recognition (10-level target)
 Predicting consumer preferences towards different
kinds of vehicles (could be as many as several
hundred level target)
 Predicting efficacy of a drug based upon demographic factors
 Predicting the amount of sales (target) based on current
observed conditions
 Predicting user energy consumption (target) depending on
the season, business type, location, etc.
 Predicting medium house value (target) based on the crime
rate, pollution level, proximity, age, industrialization level,
etc.
 DNA Microarray Data- which samples cluster together? Which
genes cluster together?
 Market Basket Analysis- which products do customers tend to
buy together?
 Clustering For Classification- Handwritten zip code problem:
can we find prototype digits for 1,2, etc. to use for
classification?
 The answer usually has two sides:
◦ Understanding the relationship
◦ Predictive accuracy
 Some algorithms dominate one side (understanding)
◦ Classical methods
◦ Single trees
◦ Nearest neighbor
◦ MARS
 Others dominate the other side (predicting)
◦ Neural nets
◦ TreeNet
◦ Random Forests
 Leo Breiman says:
◦ Framing the question as the choice between accuracy
and interpretability is an incorrect interpretation of what
the goal of a statistical analysis is
 The goal is NOT interpretability, but accurate information
 Nature’s mechanisms are generally complex and cannot be
summarized by a relatively simple stochastic model, even as
a first approximation
 The better the model fits the data, the more sound the
inferences about the phenomenon are
 The only way to attain the best predictive accuracy o
real life data is to build a complex model
 Analyzing this model will also provide the most
accurate insight!
 At the same time, the model complexity makes it far
more difficult to analyze it
◦ A random forest may contain 3,000 trees jointly
contributing to the overall prediction
◦ There could be 5,000 association rules found in a typical
unsupervised learning algorithm
 (Insert table)
 Example of a classification tree for UCSD
heart decease study
 Relatively fast
 Requires minimal supervision by analyst
 Produces easy to understand models
 Conducts automatic variable selection
 Handles missing values via surrogate splits
 Invariant to monotonic transformations of predictors
 Impervious to outliers
 Piece-wise constant models
 “Sharp” decision boundaries
 Exponential data exhaustion
 Difficulties capturing global linear patterns
 Models tend to evolve around the strongest effects
 Not the best predictive accuracy
 A random forest is a collection of single trees grown in a
special way
 The overall prediction is determined by voting (in
classification) or averaging (in regression)
 The law of Large Numbers ensures convergence
 The key to accuracy is low correlation and bias
 To keep bias low, trees are grown to maximum depth
 Each tree is grown on a bootstrap sample from the learning
set
 A number R us specified (square root by default) such that it
is noticeably smaller than the total number of available
predictors
 During tree growing phase, at each node only R predictors are
randomly selected and tried
 All major advantages of a single tree are automatically
preserved
 Since each tree is grown on a bootstrap sample, one can
◦ Use out of bag samples to compute an unbiased estimate of
the accuracy
◦ Use out of bag samples to determine variable importances
 There is no overfitting as the number of trees increases
 It is possible to compute generalized proximity between any pair
of cases
 Based on proximities one can
◦ Proceed with a well-defined clustering solution
◦ Detect outliers
◦ Generate informative data views/projections using scaling
coordinates
◦ Do missing value imputation
 Easy expansion into the unsupervised learning domain
 High levels of predictive accuracy delivered automatically
◦ Only a few control parameters to experiment with
◦ Strong for both regression and classification
 Resistant to overtraining (overfitting)- generalizes well to new data
 Trains rapidly even with thousands of potential predictors
◦ No need for prior feature (variable) selection
 Diagnostic pinpoint multivariate outliers
 Offers a revolutionary new approach to clustering using tree-based
between-record distance measures
 Built on CART® inspired trees and thus
◦ Results invariant to monotone transformations of variables
 Method intended to generate a large number of substantially
different models
◦ Randomness introduced in two simultaneous ways
◦ By row: records selected for training at random with replacement (as in
bootstrap resampling of the bagger)
◦ By column: candidate predictors at any node are chosen at random and
best splitter selected from the random subset
 Each tree is grown out to maximal size and left unpruned
◦ Trees are deliberately overfit, becoming a form of nearest neighbor
predictor
◦ Experiments convincingly show that pruning these trees hurt performance
◦ Overfit individual trees combine to yield properly fit ensembles
 Self-testing possible even if all data is used for training
◦ Only 63% of available training data will be used to grow any one
tree
◦ A 37% portion of training data always unused
 The unused portion of the training data is known as Out-Of-Bag (OOB)
data and can be used to provide an ongoing dynamic assessment of
model performance
◦ Allows fitting to small data sets without explicitly holding back
any data for testing
◦ All training data is used cumulatively in training, but only a 63%
portion used at any one time
 Similar to cross-validation but unstructured
 Intensive post processing of data to extract more
insight into data
◦ Most important is introduction of distance metric
between any two data records
◦ The more similar two records are the more often they
will land in same terminal node of a tree
◦ With a large number of different trees simply count the
number of times they co-locate in same leaf nodes
◦ Distance metric can be used to construct dissimilarity
matrix input into hierarchical clustering
 Ultimately in modeling our goal is to produce a single
score, prediction, forecast, or class assignment
 The motivation generating multiple models is the
hope that by somehow combining models results will
be better than if we relied on a single model
 When multiple models are generated they are
normally combined by
◦ Voting in classification problems, perhaps weighted
◦ Averaging in regression problems, perhaps weighted
 Combining trees via averaging or voting will only be
beneficial if the trees are different from each other
 In original bootstrap aggregation paper Breiman noted
bagging worked best for high variance (unstable)
techniques
◦ If results of each model are near identical little to be
gained by averaging
 Resampling of the bagger from the training data
intended to induce differences in trees
◦ Accomplished essentially varying the weight on any
data record
 Bootstrap sample is fairly similar to taking a 65% sample from
the original training data
 If you grow many trees each based on a different 65% random
sample of your data you expect some variation in the trees
produced
 Bootstrap sample goes a bit further in ensuring that the new
sample is of the same size as the original by allowing some
records to be selected multiple times
 In practice the different samples induce different trees but
trees are not that different
 The bagger was limited by the fact that even with resampling
trees are likely to be somewhat similar to each other,
particularly with strong data structure
 Random Forests induces vastly more between tree differences
by forcing splits to be based on different predictors
◦ Accomplished by introducing randomness into split
selection
 Breiman points out tradeoff:
◦ As R increases strength of individual tree should increase
◦ However, correlation between trees also increases reducing advantage of
combining
 Want to select R to optimally balance the two effects
◦ Can only be determined via experimentation
 Breiman has suggested three values to test:
◦ R= 1/2sqrt(M)
◦ R= sqrt(M)
◦ R= 2sqrt(M)
◦ For M= 100 test values for R: 5,10,20
◦ For M= 400 test values for R: 10, 20, 40
 Random Forests machinery unlike CART in that
◦ Only one splitting rule: Gini
◦ Class weight concept but no explicit priors or costs
◦ No surrogates: Missing values imputed for data first automatically
 Default fast imputation just uses means
 Compute intensive method uses tree-based nearest neighbors to base
imputation on (discussed later)
◦ None of the display and reporting machinery are tree refinement
services of CART
 Does follow CART in that all splits are binary
 Trees combined via voting (classification) or averaging
(regression)
 Classification trees “vote”
◦ Recall that classification trees classify
 Assign each case to ONE class only
◦ With 50 trees, 50 class assignments for each case
◦ Winner is the class with the most votes
◦ Votes could be weighted- say by accuracy of individual trees
 Regression trees assign a real predicted value for each case
◦ Predictions are combined via averaging
◦ Results will be much smoother than from a single tree
 Probability of being omitted in a single draw is (1-1/n)
 Probability of being omitted in all n draws is (1-1/n)n
 Limit of series as n increases is (1/e)= 0.368
◦ Approximately 36.8% sample excluded 0% of resample
◦ 36.8% sample included once 36.8% of resample
◦ 18.4% sample included twice thus represent…36.8% of resample
◦ 6.1% sample included three times…18.4% of resample
◦ 1.9% sample included four or more times…8% if resample 100%
◦ Example: distribution of weights in a 2,000 record resample:
◦ (insert table)
 Want to use mass spectrometer data to classify
different types of prostate cancer
◦ 772 observations available
 398- healthy samples
 178- 1st type of cancer samples
 196- 2nd type of cancer samples
◦ 111 mass spectra measurements are recorded for each
sample
 (insert table)
 The above table shows cross-validated prediction success
results of a single CART tree for the prostate data
 The run was conducted under PRIORS DATA to facilitate
comparisons with subsequent RF run
◦ The relative error corresponds to the absolute error of
30.4%
 Topic discussed by several Machine Learning researchers
 Possibilities:
◦ Select splitter, split point, or both at random
◦ Choose splitter at random from the top K splitters
 Random Forests: Suppose we have M available predictors
◦ Select R eligible splitters at random and let best split node
◦ If R=1 this is just random splitter selection
◦ If R=M this becomes Brieman’s bagger
◦ If R<< M then we get Breian’s Random Forests
 Breiman suggests R=sqrt(M) as a good rule of thumb
 A performance of a single tree will be somewhat driven by the
number of candidate predictors allowed at each node
 Consider R=1: the splitter is always chosen at random +
performance could be quite weak
 As relevant splitters get into tree and tree is allowed to grow
massively, single tree can be predictive even if R=1
 As R is allowed to increase quality of splits can improve as
there will be better (and more relevant) splitters
 (insert graph)
 In this experiment, we ran RF with 100 trees on the
prostate data using different values for the number
of variables Nvars searched at each split
 RF clearly outperforms single tree for any number of Nvars
◦ We saw above that a properly pruned tree gives cross-validated absolute
error of 30.4% (the very right end of the red curve)
 The performance of a single tree tends to deviate substantially
with the number of predictors allowed to be searched (a single
tree is a high variance object)
 The RF reaches the nearly stable error rate of about 20% when
only 10 variables are searched in each node (marked by the blue
color)
 Discounting the minor fluctuations, the error rate also remains
stable for Nvars above 10
◦ This generally agrees with Breiman’s suggestion to use square root N=111
as a rough estimate of the optimal value for Nvars
 The performance for small Nvars can be usually further improved
by increasing the number of runs
 (insert graph)
 (insert table)
 The above results correspond to a standard RF run
with 500 trees, Nvars=15, and unit class weights
 Note that the overall error rate is 19.4% which is
2/3 of the baseline CART error of 30.4%
 RF does not use a test dataset to report accuracy
 For every tree grown, about 30% of data are left out-of-bag
(OOB)
 This means that these cases can be safely used in place of the
test data to evaluate the performance of the current tree
 For any tree in RF, its own OOB sample is used- hence no bias is
ever introduced into the estimates
 The final OOB estimate for the entire RF can be simply obtained
by averaging individual OOB estimates
 Consequently, this estimate is unbiased and behaves as if we had
an independent test sample of the same size as the learn sample
 (insert table)
 The prostate dataset is somewhat partially unbalanced- class 1
contains fewer records than the remaining classes
 Under the default RF settings, the minority classes will have
higher misclassification rates than the dominant classes
 Misbalance in the individual class error rates may also be caused
by other data specific issues
 Class weights are used in RF to boost the accuracy of the
specified classes
 General Rule of Thumb: to increase accuracy in the given class,
one should increase the corresponding class weight
 In many ways this is similar to the PRIORS control used in CART
for the same purpose
 Our next run sets the weight for class one to
2
 As a result, class 1 is classified with a much
better accuracy at the cost of slightly reduced
accuracy in the remaining classes
 At the end of an RF run, the proportion of votes for
each class is recorded
 We can define Margin of a case simply as the
proportion of votes for the true class minus the
maximum proportion of votes for the other classes
 The larger the margin, the higher the confidence of
classification
 (insert table)
 This extract shows percent votes for the top 30
records in the dataset along with the
corresponding margins
 The green lines have high margins and therefore
high confidence of predictions
 The pink lines have negative margins, which means
that these observations are not classified correctly
 The concept of margin allows new “unbiased” definition of variable
importance
 To estimate the importance of the mth variable:
◦ Take the OOB cases for the ldh tree, assume that we already know the margin for
those cases M
◦ Randomly permute all values of the variable m
◦ Apply the ldh tree to the OOB cases with the permuted values
◦ Compute the new margin M
◦ Compute the difference M-M
 The variable importance is defined as the average lowering of the margin
across all OOB cases and all trees in the RF
 This procedure is fundamentally different from the intrinsic variable
importance scored computed by CART- the latter are always based on
the LEARN data and are subject to the overfitting issues
 The top portion of the variable importance list for the
data is shown here
 Analysis of the complete list reveals that all 111
variables are nearly equally strongly contributing to
the model predictions
 This is in a striking contrast with the single CART tree
that has no choice but to use a limited subset of
variables by tree’s construction
 The above explains why the RF model has a
significantly lower error rate (20%) when compared to
a single CART tree (30%)
 RF introduces a novel way to define proximity between two
observations
◦ Initialize proximities to zeroes
◦ For any given tree, apply the tree to all cases
◦ If case I and j both end up in the same node, increase proximity prox(ij)
between I and j by one
◦ Accumulate over all trees in RF and normalize by twice the number of trees
in RF
 The resulting matrix of size NxN provides intrinsic measure of
proximity
◦ The measure is invariant to monotone transformations
◦ The measure is clearly defined for any type of independent variables,
including categorical
 (insert graph)
 The above extract shows the proximity matrix for the
top 10 records of the prostate dataset
◦ Note ones on the main diagonal- any case has
“perfect” proximity to itself
◦ Observations that are “alike” will have proximities
close to one
 these cells have green background
◦ The closer proximity to 0, the more dissimilar cases i
and j are
 These cells have pink B
 Having the full intrinsic proximity matrix opens new horizons
◦ Informative data views using metric scaling
◦ Missing value imputation
◦ Outlier detection
 Unfortunately, things get out of control when dataset size
exceeds 5,000 observations (25,000,000+ cells are needed)
 RF switches to “compressed” form of the proximity matrix to
handle large datasets- for any case, only M closest cases are
recorded. M is usually less than 100.
 The values 1-prox(ij) can be treated as Euclidean distances
in a high dimensional space
 The theory of metric scaling solves the problem of finding
the most representative projections of the underlying data
“cloud” onto low dimensional space using the data
proximities
◦ The theory is similar in spirit to the principal components analysis
and discriminant analysis
 The solution is given in the form of ordered “scaling
coordinates”
 Looking at the scatter plots of the top scaling coordinates
provides informative views of the data
 (insert graph)
 This extract shows five initial scaling coordinates for
the top 30 records of the prostate data
 We will look at the scatter plots among the first,
second, and third scaling coordinates
 The following color codes will be used for the target
classes:
◦ Green- class 0
◦ Red- class 1
◦ Blue- class 2
 (insert graphs)
 A nearly perfect separation of all three classes is clearly seen
 From this we conclude that the outcome variable admits clear
prediction using RF model which utilizes 111 original
predictors
 The residual error is mostly due to the presence of the “focal”
point where all the three rays meet
 (insert graph)
 (insert graphs)
 Again, three distinct target classes show up as
separate clusters
 The “focal” point represents a cluster of records
that can’t be distinguished from each other
 Outliers are defined as cases having small proximities to
all other cases belonging to the same target class
 The following algorithm is used:
◦ For a case n, compute the sum of the squares of prox(nk) for all k
in the same class as n
◦ Take the inverse- it will be large if the case is “far away” from the
rest
◦ Standardize using the median and standard deviation
◦
◦ Look at the cases with the largest values- those are potential
outliers
 Generally, a value above 10 is reason to suspect the case
of being an outlier
 This extract shows top 30 records of the prostate
dataset sorted descending by the outlier measure
 Clearly the top 6 cases (class 2 with IDs: 771, 683,
539, and class 0 with IDs 127, 281, 282) are
suspicious
 All of these seem to be located at the “focal point”
on the corresponding scaling coordinate plots
 (insert graph)
 RF offers two ways of missing value imputation
 The Cheap Way- conventional median imputation for continuous
variables and mode imputation for categorical variables
 The Right Way:
◦ Suppose case n has x coordinate missing
◦ Do the Cheap Way imputation for starters
◦ Grow a full size RF
◦ We can now re-estimate the missing value by a weighted average
◦ over all cases k with non-missing x using weights prox(nk)
◦ Repeat steps 2 and 3 several times to ensure convergence
 An alternative display to view how the target classes are
different with respect to the individual predictors
◦ Recall, at the end of an RF run all cases in the dataset, obtain K
separate votes for the class membership (assuming K target
classes)
◦ Take any target class and sort all observations by the count of
votes for this class descending
◦ Take the top 50 observations and the bottom 50 observations,
those are correspondingly the most likely and the least likely
members of the given target class
◦ Parallel coordinate plots report uniformly (0,1) scaled values of all
predictors for the top 50 and bottom 50 sorted records, along
with the 25th, 50th and j percentiles within each predictor
 (insert graph)
 This is a detailed display of the normalized values
of the initial 20 predictors for the top voted 50
records in each target class (this gives 50x3=150
graphs)
 Class 0 generally has normalized values of the
initial 20 predictors close to 0 (left side 0tt, lw, y,
o, ragg, wp) except perhaps M9X11
 (insert graph)
 It is easier to see this when looking at the quartile
plots only
 Note that class 2 tends to have the largest values
of the corresponding predictors
 The graph can be scrolled forward to view all of the
111 predictors
 (insert graph)
 The least likely plots roughly result to the similar
conclusions: small predictor values are the least
likely for class 2, etc.
 RF admits an interesting possibility to solve unsupervised learning
problems, in particular, clustering problems and missing value
imputation in the general sense
 Recall that in the unsupervised learning the concept of target is not
defined
 RF generates a synthetic target variable in order to proceed with a
regular run:
◦ Give class label 1 to the original data
◦ Create a copy of the data such that each variable is sampled independently from the
values available in the original dataset
◦ Give class label 2 to the copy of the data
◦ Note that the second copy has marginal distributions identical to the first copy,
whereas the possible dependency among predictors is completely destroyed
◦
◦ A necessary drawback is that the resulting dataset is twice as large as the original
 We now have a clear binary supervised learning problem
 Running an RF on this dataset may provide the following
insights:
◦ When the resulting misclassification error is high (above 50%), the
variables are basically independent- no interesting structure exists
◦ Otherwise, the dependency structure can be further studied by looking at
the scaling coordinates and exploiting the proximity matrix in other ways
◦ For instance, the resulting proximity matrix can be used as an important
starting point for the subsequent hierarchical clustering analysis
 Recall that the proximity measures are invariant to monotone
transformations and naturally support categorical variables
 The same missing value imputation procedure as before can now
be employed
 These techniques work extremely well for small datasets
 We generated a synthetic dataset based on the
prostate data
 The resulting dataset still has 111 predictors but
twice the number of records- the first half being
the exact replica of the original data
 The final error is only 0.2% which is an indication of
a very strong dependency among the predictors
 (insert graph)
 The resulting plots resemble what we had before
 However, this distance is in terms of how
dependent the predictors are, whereas previously it
was in terms of having the same target class
 In view of this, the non cancerous tissue (green)
appears to stand apart from the cancerous
 + Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
 + Breiman, L. (1996). Arcing classifiers (Technical Report). Berkeley: Statistics
Department, University of California.
 + Buntine, W. (1991). Learning classification trees. In D.J. Hand, ed., Artificial
Intelligence Frontiers in Statistics, Chapman and Hall: London, 182-201.
 + Dietterich, T. (1998). An experimental comparison of three methods for
constructing ensembles of decision trees: Bagging, Boosting, and Randomization.
Machine Learning, 40, 139-158.
 + Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm.
In L. Saitta, ed., Machine Learning: Proceedings of the Thirteenth National
Conference, Morgan Kaufmann, pp. 148-156.
 + Friedman, J.H. (1999). RandomForests. Stanford: Statistics Department, Stanford
University.
 + Friedman, J.H. (1999). Greedy function approximation: a gradient boosting
machine. Stanford: Statistics Department, Stanford University.
 + Heath, D., Kasif, S., and Salzberg, S. (1993) k-dt: A multi-tree learning method.
Proceedings of the Second International Workshop on Multistrategy Learning,
1002-1007, Morgan Kaufman: Chambery, France.
 + Kwok, S., and Carter, C. (1990). Multiple decision trees. In Shachter, R., Levitt,
T., Kanal, L., and Lemmer, J., eds. Uncertainty in Artificial Intelligence 4, North-
Holland, 327-335.

Mais conteúdo relacionado

Mais procurados

Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11Mazhar Poohlah
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Deep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionDeep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionManmeet Singh
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Edureka!
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12Mazhar Poohlah
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkGabriel Hughes PhD
 
Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Mumbai Academisc
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees Nikolaos Vergos
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...ahmedbohy
 
Handout11
Handout11Handout11
Handout11butest
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET Journal
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
Performance evaluation of hepatitis diagnosis using single and multi classifi...
Performance evaluation of hepatitis diagnosis using single and multi classifi...Performance evaluation of hepatitis diagnosis using single and multi classifi...
Performance evaluation of hepatitis diagnosis using single and multi classifi...ahmedbohy
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 

Mais procurados (20)

Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Deep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionDeep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly Detection
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
 
Machine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talkMachine learning meets user analytics - Metageni tech talk
Machine learning meets user analytics - Metageni tech talk
 
Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...
 
Handout11
Handout11Handout11
Handout11
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
Performance evaluation of hepatitis diagnosis using single and multi classifi...
Performance evaluation of hepatitis diagnosis using single and multi classifi...Performance evaluation of hepatitis diagnosis using single and multi classifi...
Performance evaluation of hepatitis diagnosis using single and multi classifi...
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 

Semelhante a Introduction to RandomForests 2004

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxAkash527744
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptxstalkthemhaha
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by AnalogyColleen Farrelly
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy IIColleen Farrelly
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.DrezzingGaming
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learningjagan477830
 

Semelhante a Introduction to RandomForests 2004 (20)

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
decision_trees_forests_2.pptx
decision_trees_forests_2.pptxdecision_trees_forests_2.pptx
decision_trees_forests_2.pptx
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
Machine Learning by Analogy II
Machine Learning by Analogy IIMachine Learning by Analogy II
Machine Learning by Analogy II
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 

Mais de Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012Salford Systems
 

Mais de Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Introduction to RandomForests 2004

  • 1. An Introduction to RandomForests™ Salford Systems http://www.salford-systems.com golomi@salford-systems.com Dan Steinberg, Mikhail Golovnya, N. Scott Cardell
  • 2.  New approach for many data analytical tasks developed by Leo Breiman of University of California, Berkeley ◦ Co-author of CART® with Friedman, Olshen, and Stone ◦ Author of Bagging and Arcing approaches to combining trees  Good for classification and regression problems ◦ Also for clustering, density estimation ◦ Outlier and anomaly detection ◦ Explicit missing value imputation  Builds on the notions of committees of experts but is substantially different in key implementation details
  • 3.  The term usually refers to pattern discovery in large data bases  Initially appeared in the late twentieth century and directly associated with the PC boom ◦ Spread of data collection devices ◦ Dramatically increased data storage capacity ◦ Exponential growth in computational power of CPUs  The necessity to go way beyond standard statistical techniques in data analysis ◦ Dealing with extremely large numbers of variables ◦ Dealing with highly non-linear dependency structures ◦ Dealing with missing values and dirty data
  • 4.  The following major classes of problems are usually considered: ◦ Supervised Learning (interested in predicting some outcome variable based on observed predictors)  Regression (quantitative outcome)  Classification (nominal or categorical outcome) ◦ Unsupervised Learning (no single target variable available- interested in partitioning data into cluster, finding association rules, etc.)
  • 5.  Relating gene expressions to the presence of a certain decease based upon microarray data  Indentifying potential fraud cases in credit card transactions (binary target)  Predicting level of user satisfaction as poor, average, good, excellent (4-level target)  Optical Digit Recognition (10-level target)  Predicting consumer preferences towards different kinds of vehicles (could be as many as several hundred level target)
  • 6.  Predicting efficacy of a drug based upon demographic factors  Predicting the amount of sales (target) based on current observed conditions  Predicting user energy consumption (target) depending on the season, business type, location, etc.  Predicting medium house value (target) based on the crime rate, pollution level, proximity, age, industrialization level, etc.
  • 7.  DNA Microarray Data- which samples cluster together? Which genes cluster together?  Market Basket Analysis- which products do customers tend to buy together?  Clustering For Classification- Handwritten zip code problem: can we find prototype digits for 1,2, etc. to use for classification?
  • 8.  The answer usually has two sides: ◦ Understanding the relationship ◦ Predictive accuracy  Some algorithms dominate one side (understanding) ◦ Classical methods ◦ Single trees ◦ Nearest neighbor ◦ MARS  Others dominate the other side (predicting) ◦ Neural nets ◦ TreeNet ◦ Random Forests
  • 9.  Leo Breiman says: ◦ Framing the question as the choice between accuracy and interpretability is an incorrect interpretation of what the goal of a statistical analysis is  The goal is NOT interpretability, but accurate information  Nature’s mechanisms are generally complex and cannot be summarized by a relatively simple stochastic model, even as a first approximation  The better the model fits the data, the more sound the inferences about the phenomenon are
  • 10.  The only way to attain the best predictive accuracy o real life data is to build a complex model  Analyzing this model will also provide the most accurate insight!  At the same time, the model complexity makes it far more difficult to analyze it ◦ A random forest may contain 3,000 trees jointly contributing to the overall prediction ◦ There could be 5,000 association rules found in a typical unsupervised learning algorithm
  • 11.  (Insert table)  Example of a classification tree for UCSD heart decease study
  • 12.  Relatively fast  Requires minimal supervision by analyst  Produces easy to understand models  Conducts automatic variable selection  Handles missing values via surrogate splits  Invariant to monotonic transformations of predictors  Impervious to outliers
  • 13.  Piece-wise constant models  “Sharp” decision boundaries  Exponential data exhaustion  Difficulties capturing global linear patterns  Models tend to evolve around the strongest effects  Not the best predictive accuracy
  • 14.  A random forest is a collection of single trees grown in a special way  The overall prediction is determined by voting (in classification) or averaging (in regression)  The law of Large Numbers ensures convergence  The key to accuracy is low correlation and bias  To keep bias low, trees are grown to maximum depth
  • 15.  Each tree is grown on a bootstrap sample from the learning set  A number R us specified (square root by default) such that it is noticeably smaller than the total number of available predictors  During tree growing phase, at each node only R predictors are randomly selected and tried
  • 16.  All major advantages of a single tree are automatically preserved  Since each tree is grown on a bootstrap sample, one can ◦ Use out of bag samples to compute an unbiased estimate of the accuracy ◦ Use out of bag samples to determine variable importances  There is no overfitting as the number of trees increases
  • 17.  It is possible to compute generalized proximity between any pair of cases  Based on proximities one can ◦ Proceed with a well-defined clustering solution ◦ Detect outliers ◦ Generate informative data views/projections using scaling coordinates ◦ Do missing value imputation  Easy expansion into the unsupervised learning domain
  • 18.  High levels of predictive accuracy delivered automatically ◦ Only a few control parameters to experiment with ◦ Strong for both regression and classification  Resistant to overtraining (overfitting)- generalizes well to new data  Trains rapidly even with thousands of potential predictors ◦ No need for prior feature (variable) selection  Diagnostic pinpoint multivariate outliers  Offers a revolutionary new approach to clustering using tree-based between-record distance measures  Built on CART® inspired trees and thus ◦ Results invariant to monotone transformations of variables
  • 19.  Method intended to generate a large number of substantially different models ◦ Randomness introduced in two simultaneous ways ◦ By row: records selected for training at random with replacement (as in bootstrap resampling of the bagger) ◦ By column: candidate predictors at any node are chosen at random and best splitter selected from the random subset  Each tree is grown out to maximal size and left unpruned ◦ Trees are deliberately overfit, becoming a form of nearest neighbor predictor ◦ Experiments convincingly show that pruning these trees hurt performance ◦ Overfit individual trees combine to yield properly fit ensembles
  • 20.  Self-testing possible even if all data is used for training ◦ Only 63% of available training data will be used to grow any one tree ◦ A 37% portion of training data always unused  The unused portion of the training data is known as Out-Of-Bag (OOB) data and can be used to provide an ongoing dynamic assessment of model performance ◦ Allows fitting to small data sets without explicitly holding back any data for testing ◦ All training data is used cumulatively in training, but only a 63% portion used at any one time  Similar to cross-validation but unstructured
  • 21.  Intensive post processing of data to extract more insight into data ◦ Most important is introduction of distance metric between any two data records ◦ The more similar two records are the more often they will land in same terminal node of a tree ◦ With a large number of different trees simply count the number of times they co-locate in same leaf nodes ◦ Distance metric can be used to construct dissimilarity matrix input into hierarchical clustering
  • 22.  Ultimately in modeling our goal is to produce a single score, prediction, forecast, or class assignment  The motivation generating multiple models is the hope that by somehow combining models results will be better than if we relied on a single model  When multiple models are generated they are normally combined by ◦ Voting in classification problems, perhaps weighted ◦ Averaging in regression problems, perhaps weighted
  • 23.  Combining trees via averaging or voting will only be beneficial if the trees are different from each other  In original bootstrap aggregation paper Breiman noted bagging worked best for high variance (unstable) techniques ◦ If results of each model are near identical little to be gained by averaging  Resampling of the bagger from the training data intended to induce differences in trees ◦ Accomplished essentially varying the weight on any data record
  • 24.  Bootstrap sample is fairly similar to taking a 65% sample from the original training data  If you grow many trees each based on a different 65% random sample of your data you expect some variation in the trees produced  Bootstrap sample goes a bit further in ensuring that the new sample is of the same size as the original by allowing some records to be selected multiple times  In practice the different samples induce different trees but trees are not that different
  • 25.  The bagger was limited by the fact that even with resampling trees are likely to be somewhat similar to each other, particularly with strong data structure  Random Forests induces vastly more between tree differences by forcing splits to be based on different predictors ◦ Accomplished by introducing randomness into split selection
  • 26.  Breiman points out tradeoff: ◦ As R increases strength of individual tree should increase ◦ However, correlation between trees also increases reducing advantage of combining  Want to select R to optimally balance the two effects ◦ Can only be determined via experimentation  Breiman has suggested three values to test: ◦ R= 1/2sqrt(M) ◦ R= sqrt(M) ◦ R= 2sqrt(M) ◦ For M= 100 test values for R: 5,10,20 ◦ For M= 400 test values for R: 10, 20, 40
  • 27.  Random Forests machinery unlike CART in that ◦ Only one splitting rule: Gini ◦ Class weight concept but no explicit priors or costs ◦ No surrogates: Missing values imputed for data first automatically  Default fast imputation just uses means  Compute intensive method uses tree-based nearest neighbors to base imputation on (discussed later) ◦ None of the display and reporting machinery are tree refinement services of CART  Does follow CART in that all splits are binary
  • 28.  Trees combined via voting (classification) or averaging (regression)  Classification trees “vote” ◦ Recall that classification trees classify  Assign each case to ONE class only ◦ With 50 trees, 50 class assignments for each case ◦ Winner is the class with the most votes ◦ Votes could be weighted- say by accuracy of individual trees  Regression trees assign a real predicted value for each case ◦ Predictions are combined via averaging ◦ Results will be much smoother than from a single tree
  • 29.  Probability of being omitted in a single draw is (1-1/n)  Probability of being omitted in all n draws is (1-1/n)n  Limit of series as n increases is (1/e)= 0.368 ◦ Approximately 36.8% sample excluded 0% of resample ◦ 36.8% sample included once 36.8% of resample ◦ 18.4% sample included twice thus represent…36.8% of resample ◦ 6.1% sample included three times…18.4% of resample ◦ 1.9% sample included four or more times…8% if resample 100% ◦ Example: distribution of weights in a 2,000 record resample: ◦ (insert table)
  • 30.  Want to use mass spectrometer data to classify different types of prostate cancer ◦ 772 observations available  398- healthy samples  178- 1st type of cancer samples  196- 2nd type of cancer samples ◦ 111 mass spectra measurements are recorded for each sample
  • 31.  (insert table)  The above table shows cross-validated prediction success results of a single CART tree for the prostate data  The run was conducted under PRIORS DATA to facilitate comparisons with subsequent RF run ◦ The relative error corresponds to the absolute error of 30.4%
  • 32.  Topic discussed by several Machine Learning researchers  Possibilities: ◦ Select splitter, split point, or both at random ◦ Choose splitter at random from the top K splitters  Random Forests: Suppose we have M available predictors ◦ Select R eligible splitters at random and let best split node ◦ If R=1 this is just random splitter selection ◦ If R=M this becomes Brieman’s bagger ◦ If R<< M then we get Breian’s Random Forests  Breiman suggests R=sqrt(M) as a good rule of thumb
  • 33.  A performance of a single tree will be somewhat driven by the number of candidate predictors allowed at each node  Consider R=1: the splitter is always chosen at random + performance could be quite weak  As relevant splitters get into tree and tree is allowed to grow massively, single tree can be predictive even if R=1  As R is allowed to increase quality of splits can improve as there will be better (and more relevant) splitters
  • 34.  (insert graph)  In this experiment, we ran RF with 100 trees on the prostate data using different values for the number of variables Nvars searched at each split
  • 35.  RF clearly outperforms single tree for any number of Nvars ◦ We saw above that a properly pruned tree gives cross-validated absolute error of 30.4% (the very right end of the red curve)  The performance of a single tree tends to deviate substantially with the number of predictors allowed to be searched (a single tree is a high variance object)  The RF reaches the nearly stable error rate of about 20% when only 10 variables are searched in each node (marked by the blue color)  Discounting the minor fluctuations, the error rate also remains stable for Nvars above 10 ◦ This generally agrees with Breiman’s suggestion to use square root N=111 as a rough estimate of the optimal value for Nvars  The performance for small Nvars can be usually further improved by increasing the number of runs
  • 37.  (insert table)  The above results correspond to a standard RF run with 500 trees, Nvars=15, and unit class weights  Note that the overall error rate is 19.4% which is 2/3 of the baseline CART error of 30.4%
  • 38.  RF does not use a test dataset to report accuracy  For every tree grown, about 30% of data are left out-of-bag (OOB)  This means that these cases can be safely used in place of the test data to evaluate the performance of the current tree  For any tree in RF, its own OOB sample is used- hence no bias is ever introduced into the estimates  The final OOB estimate for the entire RF can be simply obtained by averaging individual OOB estimates  Consequently, this estimate is unbiased and behaves as if we had an independent test sample of the same size as the learn sample
  • 40.  The prostate dataset is somewhat partially unbalanced- class 1 contains fewer records than the remaining classes  Under the default RF settings, the minority classes will have higher misclassification rates than the dominant classes  Misbalance in the individual class error rates may also be caused by other data specific issues  Class weights are used in RF to boost the accuracy of the specified classes  General Rule of Thumb: to increase accuracy in the given class, one should increase the corresponding class weight  In many ways this is similar to the PRIORS control used in CART for the same purpose
  • 41.  Our next run sets the weight for class one to 2  As a result, class 1 is classified with a much better accuracy at the cost of slightly reduced accuracy in the remaining classes
  • 42.  At the end of an RF run, the proportion of votes for each class is recorded  We can define Margin of a case simply as the proportion of votes for the true class minus the maximum proportion of votes for the other classes  The larger the margin, the higher the confidence of classification
  • 43.  (insert table)  This extract shows percent votes for the top 30 records in the dataset along with the corresponding margins  The green lines have high margins and therefore high confidence of predictions  The pink lines have negative margins, which means that these observations are not classified correctly
  • 44.  The concept of margin allows new “unbiased” definition of variable importance  To estimate the importance of the mth variable: ◦ Take the OOB cases for the ldh tree, assume that we already know the margin for those cases M ◦ Randomly permute all values of the variable m ◦ Apply the ldh tree to the OOB cases with the permuted values ◦ Compute the new margin M ◦ Compute the difference M-M  The variable importance is defined as the average lowering of the margin across all OOB cases and all trees in the RF  This procedure is fundamentally different from the intrinsic variable importance scored computed by CART- the latter are always based on the LEARN data and are subject to the overfitting issues
  • 45.  The top portion of the variable importance list for the data is shown here  Analysis of the complete list reveals that all 111 variables are nearly equally strongly contributing to the model predictions  This is in a striking contrast with the single CART tree that has no choice but to use a limited subset of variables by tree’s construction  The above explains why the RF model has a significantly lower error rate (20%) when compared to a single CART tree (30%)
  • 46.  RF introduces a novel way to define proximity between two observations ◦ Initialize proximities to zeroes ◦ For any given tree, apply the tree to all cases ◦ If case I and j both end up in the same node, increase proximity prox(ij) between I and j by one ◦ Accumulate over all trees in RF and normalize by twice the number of trees in RF  The resulting matrix of size NxN provides intrinsic measure of proximity ◦ The measure is invariant to monotone transformations ◦ The measure is clearly defined for any type of independent variables, including categorical
  • 47.  (insert graph)  The above extract shows the proximity matrix for the top 10 records of the prostate dataset ◦ Note ones on the main diagonal- any case has “perfect” proximity to itself ◦ Observations that are “alike” will have proximities close to one  these cells have green background ◦ The closer proximity to 0, the more dissimilar cases i and j are  These cells have pink B
  • 48.  Having the full intrinsic proximity matrix opens new horizons ◦ Informative data views using metric scaling ◦ Missing value imputation ◦ Outlier detection  Unfortunately, things get out of control when dataset size exceeds 5,000 observations (25,000,000+ cells are needed)  RF switches to “compressed” form of the proximity matrix to handle large datasets- for any case, only M closest cases are recorded. M is usually less than 100.
  • 49.  The values 1-prox(ij) can be treated as Euclidean distances in a high dimensional space  The theory of metric scaling solves the problem of finding the most representative projections of the underlying data “cloud” onto low dimensional space using the data proximities ◦ The theory is similar in spirit to the principal components analysis and discriminant analysis  The solution is given in the form of ordered “scaling coordinates”  Looking at the scatter plots of the top scaling coordinates provides informative views of the data
  • 50.  (insert graph)  This extract shows five initial scaling coordinates for the top 30 records of the prostate data  We will look at the scatter plots among the first, second, and third scaling coordinates  The following color codes will be used for the target classes: ◦ Green- class 0 ◦ Red- class 1 ◦ Blue- class 2
  • 51.  (insert graphs)  A nearly perfect separation of all three classes is clearly seen  From this we conclude that the outcome variable admits clear prediction using RF model which utilizes 111 original predictors  The residual error is mostly due to the presence of the “focal” point where all the three rays meet
  • 53.  (insert graphs)  Again, three distinct target classes show up as separate clusters  The “focal” point represents a cluster of records that can’t be distinguished from each other
  • 54.  Outliers are defined as cases having small proximities to all other cases belonging to the same target class  The following algorithm is used: ◦ For a case n, compute the sum of the squares of prox(nk) for all k in the same class as n ◦ Take the inverse- it will be large if the case is “far away” from the rest ◦ Standardize using the median and standard deviation ◦ ◦ Look at the cases with the largest values- those are potential outliers  Generally, a value above 10 is reason to suspect the case of being an outlier
  • 55.  This extract shows top 30 records of the prostate dataset sorted descending by the outlier measure  Clearly the top 6 cases (class 2 with IDs: 771, 683, 539, and class 0 with IDs 127, 281, 282) are suspicious  All of these seem to be located at the “focal point” on the corresponding scaling coordinate plots
  • 57.  RF offers two ways of missing value imputation  The Cheap Way- conventional median imputation for continuous variables and mode imputation for categorical variables  The Right Way: ◦ Suppose case n has x coordinate missing ◦ Do the Cheap Way imputation for starters ◦ Grow a full size RF ◦ We can now re-estimate the missing value by a weighted average ◦ over all cases k with non-missing x using weights prox(nk) ◦ Repeat steps 2 and 3 several times to ensure convergence
  • 58.  An alternative display to view how the target classes are different with respect to the individual predictors ◦ Recall, at the end of an RF run all cases in the dataset, obtain K separate votes for the class membership (assuming K target classes) ◦ Take any target class and sort all observations by the count of votes for this class descending ◦ Take the top 50 observations and the bottom 50 observations, those are correspondingly the most likely and the least likely members of the given target class ◦ Parallel coordinate plots report uniformly (0,1) scaled values of all predictors for the top 50 and bottom 50 sorted records, along with the 25th, 50th and j percentiles within each predictor
  • 59.  (insert graph)  This is a detailed display of the normalized values of the initial 20 predictors for the top voted 50 records in each target class (this gives 50x3=150 graphs)  Class 0 generally has normalized values of the initial 20 predictors close to 0 (left side 0tt, lw, y, o, ragg, wp) except perhaps M9X11
  • 60.  (insert graph)  It is easier to see this when looking at the quartile plots only  Note that class 2 tends to have the largest values of the corresponding predictors  The graph can be scrolled forward to view all of the 111 predictors
  • 61.  (insert graph)  The least likely plots roughly result to the similar conclusions: small predictor values are the least likely for class 2, etc.
  • 62.  RF admits an interesting possibility to solve unsupervised learning problems, in particular, clustering problems and missing value imputation in the general sense  Recall that in the unsupervised learning the concept of target is not defined  RF generates a synthetic target variable in order to proceed with a regular run: ◦ Give class label 1 to the original data ◦ Create a copy of the data such that each variable is sampled independently from the values available in the original dataset ◦ Give class label 2 to the copy of the data ◦ Note that the second copy has marginal distributions identical to the first copy, whereas the possible dependency among predictors is completely destroyed ◦ ◦ A necessary drawback is that the resulting dataset is twice as large as the original
  • 63.  We now have a clear binary supervised learning problem  Running an RF on this dataset may provide the following insights: ◦ When the resulting misclassification error is high (above 50%), the variables are basically independent- no interesting structure exists ◦ Otherwise, the dependency structure can be further studied by looking at the scaling coordinates and exploiting the proximity matrix in other ways ◦ For instance, the resulting proximity matrix can be used as an important starting point for the subsequent hierarchical clustering analysis  Recall that the proximity measures are invariant to monotone transformations and naturally support categorical variables  The same missing value imputation procedure as before can now be employed  These techniques work extremely well for small datasets
  • 64.  We generated a synthetic dataset based on the prostate data  The resulting dataset still has 111 predictors but twice the number of records- the first half being the exact replica of the original data  The final error is only 0.2% which is an indication of a very strong dependency among the predictors
  • 65.  (insert graph)  The resulting plots resemble what we had before  However, this distance is in terms of how dependent the predictors are, whereas previously it was in terms of having the same target class  In view of this, the non cancerous tissue (green) appears to stand apart from the cancerous
  • 66.  + Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.  + Breiman, L. (1996). Arcing classifiers (Technical Report). Berkeley: Statistics Department, University of California.  + Buntine, W. (1991). Learning classification trees. In D.J. Hand, ed., Artificial Intelligence Frontiers in Statistics, Chapman and Hall: London, 182-201.  + Dietterich, T. (1998). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization. Machine Learning, 40, 139-158.  + Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In L. Saitta, ed., Machine Learning: Proceedings of the Thirteenth National Conference, Morgan Kaufmann, pp. 148-156.  + Friedman, J.H. (1999). RandomForests. Stanford: Statistics Department, Stanford University.  + Friedman, J.H. (1999). Greedy function approximation: a gradient boosting machine. Stanford: Statistics Department, Stanford University.  + Heath, D., Kasif, S., and Salzberg, S. (1993) k-dt: A multi-tree learning method. Proceedings of the Second International Workshop on Multistrategy Learning, 1002-1007, Morgan Kaufman: Chambery, France.  + Kwok, S., and Carter, C. (1990). Multiple decision trees. In Shachter, R., Levitt, T., Kanal, L., and Lemmer, J., eds. Uncertainty in Artificial Intelligence 4, North- Holland, 327-335.