SlideShare uma empresa Scribd logo
1 de 49
Overview of TreeNet® Technology
   Stochastic Gradient Boosting




               Dan Steinberg
                   2012
       http://www.salford-systems.com
Introduction to TreeNet: Stochastic Gradient Boosting

     Powerful approach to machine learning and function approximation
      developed by Jerome H. Friedman at Stanford University
          Friedman is a coauthor of CART® with Breiman, Olshen and Stone
          Author of MARS®, PRIM, Projection Pursuit, COSA, RuleFit and more

     TreeNet is unusual among learning machines in being very strong for both
      classification and regression problems
     Stochastic gradient boosting is now widely regarded as the most powerful
      methodology available for most predictive modeling contexts

     Builds on the notions of committees of experts and boosting but is
      substantially different in key implementation details




Salford Systems © Copyright 2005-2012                                            2
Aspects of TreeNet

     Built on CART trees and thus
          immune to outliers
          handles missing values automatically
          results invariant under order preserving transformations of variables
                No need to ever consider functional form revision (log, sqrt, power)

     Highly discriminatory variable selector
          Effective with thousands of predictors

     Detects and identifies important interactions
          Can be used to easily test for presence of interactions and their degree

     Resistant to overtraining – generalizes well
     Can be remarkably accurate with little effort
          Should easily outperform conventional models



Salford Systems © Copyright 2005-2012                                                   3
Adapting to Major Errors in Data

     TreeNet is a machine learning technology designed to recognize patterns in
      historical data
     Ideally the data TreeNet will use to learn from will be accurate
     In some circumstances there is a risk that the most important variable,
      namely the dependent variable is subject to error. Known as “mislabelled
      data”
     Good examples of mislabeled data can be found in
          Medical diagnoses
          Insurance claim fraud
                Thus historical data for “not frauds” actually includes undetected fraud
                Some of the “0”s are actually “1”s which complicates learning (possibly fatal)
          TreeNet manages such data successfully




Salford Systems © Copyright 2005-2012                                                             4
Some TreeNet Successes

     2010 DMA Direct Marketing Association.               1st place*
     2009 KDDCup IDAnalytics*, and FEG Japan*             1st Runner Up
     2008 DMA Direct Marketing Association                 1st Runner Up
     2007 Pacific Asia PAKDD: Credit Card Cross Sell.      1st place
     2007 DMA Direct Marketing Association                 1st place
     2006 PAKDD Pacific Asia KDD: Telco Customer Model     2nd Runner Up
     2005 BI-Cup Latin America: Predictive E-commerce*     1st place
     2004 KDDCup: Predictive Modeling                    „Most Accurate”*
     2002 NCR/Teradata Duke University: Predictive Modeling-Churn
          Four separate predictive modeling challenges      1st place



       * Won by Salford client or partner using TreeNet

Salford Systems © Copyright 2005-2012                                        5
Multi-tree methods and their single tree ancestors

     Multi-tree methods have been under development since the early 1990s.
      Most important variants (and dates of published articles) are:
          Bagger (Breiman, 1996, “Bootstrap Aggregation”)
          Boosting (Freund and Schapire, 1995)
          Multiple Additive Regression Trees (Friedman, 1999, aka MART™ or
           TreeNet®)
          RandomForests® (Breiman, 2001)

     TreeNet version 1.0 was released in 2002 but its power took some
      time to be recognized
     Work on TreeNet continues with major refinements underway
      (Friedman in collaboration with Salford Systems)




Salford Systems © Copyright 2005-2012                                         6
Multi-tree Methods: Simplest Case

     Simplest example:
          Grow a tree on training data
          Find a way to grow another different tree (change
           something in set up)
          Repeat many times, eg 500 replications
          Average results or create voting scheme.
           Eg. relate Probability of Event to fraction of trees
           predicting event for a given record


     Beauty of the method is that every new tree starts with a   Prediction
      complete set of data.
                                                                  Via Voting
     Any one tree can run out of data, but when that happens
      we just start again with a new tree and all the data
      (before sampling)




Salford Systems © Copyright 2005-2012                                          7
Automated Multiple Tree Generation

     Earliest multi-model methods recommended taking several good candidates
      and averaging them. Examples considered as few as three trees.
     Too difficult to generate multiple models manually. Hard enough to get one
      good model.
     How do we generate different trees?
          Bagger: random reweighting of the data via bootstrap resampling
                Bootstrap resampling allows for zero and positive integer weights
          RandomForests: Random splits applied to randomly reweighted data
                Implies that tree itself is grown at least partly at random
          Boosting: Reweighting data based on prior success in correctly classifying a
           case. High weights on difficult to classifiy cases. (Systematic reweighting)
          TreeNet: Boosting with major refinements. Each tree attempts to correct errors
           made by predecessors
                Each tree is linked to predecessors. Like a series expansion where the
                 addition of terms progressively improves the predictions

Salford Systems © Copyright 2005-2012                                                       8
TreeNet

     We focus on TreeNet because
          It is the method used in many successful real world studies
          Stochastic gradient boosting arguably the most popular methodology among
           experience data miners
          We have found it to be consistently more accurate than the other methods
                      Major credit card fraud detection engine uses TreeNet
                      Widespread use in on-line e-commerce
          Dramatic capabilities include:
                Display impact of any predictor (dependency plots)
                Test for existence of interactions
                Identify and rank interactions
                Constrain model: allow some interactions and disallow others
                Tools to recast TreeNet model as a logistic regression or GLM



Salford Systems © Copyright 2005-2012                                                     9
TreeNet Process

     Begin with one very small tree as initial model
          Could be as small as ONE split generating 2 terminal nodes
          Default model will have 4-6 terminal nodes
          Final output is a predicted target (regression) or predicted probability
          First tree is an intentionally “weak” model

     Compute “residuals” for this simple model (prediction error) for every record
      in data (even for classification model)
     Grow second small tree to predict the residuals from first tree
     New model is now:

                                         Tree1 + Tree2

     Compute residuals from this new 2-tree model and grow 3rd tree to predict
      revised residuals



Salford Systems © Copyright 2005-2012                                                   10
TreeNet: Trees incrementally revise predicted scores


         Tree 1                          Tree 2                    Tree 3



                          +                                +



      First tree grown                 2nd tree grown on        3rd tree grown on
      on original target.               residuals from first.       residuals from
         Intentionally                  Predictions made to       model consisting of
        “weak” model                     improve first tree         first two trees

       Every tree produces at least one positive and at least one negative node. Red
       reflects a relatively large positive and deep blue reflects a relatively negative
       node. Total “score” for a given record is obtained by finding relevant terminal
       node in every tree in model and summing across all trees

Salford Systems © Copyright 2005-2012                                                      11
TreeNet: Example Individual Trees
                Intercept                                 0.172

  YES                                                     - 0.228
                                    COST2INC < 65.4
                                                                        Tree 1
                                                          - 0.051
               EQ2CUST_STF < 5.04

   NO          Tree 1                                    + 0.068

                                                         + 0.065
               TOTAL_DEPS < 65.5M                         - 0.213
                                                                         Tree 2
                                   EQ2TOT_AST < 2.8
               Tree 2                                     - 0.010

                                                          - 0.088

               EQ2CUST_STF < 5.04                        + 0.140
                                                                         Tree 3
                                   EQ2TOT_AST < 2.8
               Tree 3                                     - 0.007
                                                                       Predicted
                                                      Other Trees      Response
Salford Systems © Copyright 2005-2012                                              12
TreeNet Methdology: Key points

     Trees are kept small (each tree learns only a little)
     Each tree is weighted by a small weight prior to being used to compute the
      current prediction or score produced by the model.
     Like a partial adjustment model. Update factors can be as small as .01, .001,
      .0001. This means that the model prediction changes by very small amounts
      after a new tree is added to the model (training cycle)
     Use random subsets of the training data for each new tree. Never train on all
      the training data in any one learning cycle
     Highly problematic cases are optionally IGNORED. If model prediction starts
      to diverge substantially from observed data for some records, those records
      may not be used in further model updates
     Model can be tuned to optimize any of several criteria:
          Least Squares, Least Absolute Deviation, Huber-M Hybrid LS/LAD
          Area under the ROC curve
          Logistic likelihood (deviance)
          Classification Accuracy
Salford Systems © Copyright 2005-2012                                                 13
Why does TreeNet work?

     Slow learning: the method “peels the onion” extracting very small amounts of
      information in any one learning cycle
     TreeNet can leverage hints available in the data across a number of
      predictors
          TreeNet can successfully include more variables than traditional models

     Can capture substantial nonlinearity and complex interactions of high degree
     TreeNet self-protects against errors in the dependent variable
           If a record is actually a “1” but is misrecorded in the data as a “0” and TreeNet
           recognizes it as a “1” TreeNet may decide to ignore this record
          Important for medical studies in which the diagnosis can be incorrect
          “Conversion” in online behavior can be misrecorded because it is not recognized
           (e.g. conversion occurs via another channel or is delayed)




Salford Systems © Copyright 2005-2012                                                           14
Multiple Additive Regression Trees

     Friedman originally named his methodology MART™ as the method generates small
      trees which are summed to obtain an overall score
     The model can be thought of as a series expansion approximating the true functional
      relationship


     We can think of each small tree as a mini-scorecard, making use of possibly different
      combinations of variables. Each mini-scorecard is designed to offer a slight
      improvement by correcting and refining its predecessors
     Because each tree starts at the “root node” and can use all of the available data a
      TreeNet model can never run out of data no matter how many trees are built.
     The number of trees required to obtain best possible performance can vary considerably
      from data set to data set
     Some models” converge” after just a handful of trees and other may require many
      thousands




Salford Systems © Copyright 2005-2012                                                         15
Selecting Optimal Model

     TreeNet first grows a large number of trees
     We evaluate performance of the ensemble as each tree is added:
          Start with 1 tree. Then go on to 2 trees (1st + 2nd ).
          Then 3 trees. (1st + 2nd + 3rd ) Etc.
     Criteria used right now are
          Least squares, Least Absolute Deviation, or Huber-M for regression
          Classification Accuracy
          Log-Likelihood
          ROC (area under curve)
          Lift in top P percentile (often top decile)




Salford Systems © Copyright 2005-2012                                           16
TreeNet Summary Screen
           Displaying ROC (Y-axis) on train and test samples at all ensemble sizes




      Each criterion is optimized using a different number of trees (size of ensemble)
             Criterion                   CXE Class Error        ROC Area       Lift
        ----------------------------------------------------------------------------
        Optimal Number of Trees:          739         1069          643          163
        Optimal Criterion           0.4224394    0.1803279    0.8862029    1.9626555

Salford Systems © Copyright 2005-2012                                                     17
How the different criteria select different response profiles

                           Artificial data: Red curve is truth
                                                              Logistic        Models fit to
                                                                              artificial data
                                        CXE                                   (RED)

                                                                              Orange is logistic
                                                    Classification Accuracy   regression

                                                                              Green is TreeNet
                                                                              Optimized for
                                                                              Classification
                                                                              Accuracy

                                                                              Blue is TreeNet
                                                                              optimized for
                                                                              max log
                                                                              likelihood (CXE)
       TreeNet (CXE) tracks data far more faithfully than LOGIT

Salford Systems © Copyright 2005-2012                                                           18
Interpreting TN Models

     As TN models consist of hundreds or even thousands of trees there is no way
      to represent the model via a display of one or two trees
     However, the model can be summarized in a variety of ways
          Partial dependency plots: These exhibit the relationship between the target and
           any predictor – as captured by the model.
          Variable Importance Rankings: These stable rankings give an excellent
           assessment of the relative importance of predictors
          For regression models SPM 7.0 displays box plots for residuals and residual outlier
           diagnostics
          For binary classification TN also displays:
          ROC curves: TN models produce scores that are typically unique for each scored
           record allowing records to be ranked from best to worst. The ROC curve and area
           under the curve reveal how successful the ranking is.
          Confusion Matrix: Using an adjustable score threshold this matrix displays the
           model false positive and false negative rates.



Salford Systems © Copyright 2005-2012                                                        19
TN Summary: Variable Importance Ranking




      Based on actual use of variables in the trees on training data
      TreeNet creates a sum of split improvement scores for every variable in every tree

Salford Systems © Copyright 2005-2012                                                       20
TN Classification Accuracy: Test Data




             Threshold can be adjusted to reflect unbalanced classes, rare events

Salford Systems © Copyright 2005-2012                                                21
TreeNet: Partial Dependency Plot




                                        Y-axis: One-half-log-odds
Salford Systems © Copyright 2005-2012                                            22
Partial Dependency Plot Explained

     Partial dependency plots are extracted from the TreeNet model via
      simulation
     To trace out the impact of the predictor X on the target Y we start with
      a single record from the training data and generate a set of
      predictions for the target Y for each of a large number of different
      values of X
     We repeat this process for every record in the training data and then
      average the results to produce the displayed partial dependency plot
     In this way the plot reflects the effects of all the other relevant
      predictors
     Important to remember that the plot is based on predictions generated
      by the model
     Smooths of plots might be used to create transforms of certain
      variables for use in GLMs or logistic regressions
Salford Systems © Copyright 2005-2012                                            23
Place Knot Locations on Graph for Smooth




                        Smooth can be created on screen within TreeNet
Salford Systems © Copyright 2005-2012                                        24
Smooth Depicted in Green




Salford Systems © Copyright 2005-2012                        25
Generate Programming Code for Smooth

  


                                                         Can obtain

                                                         SAS code

                                                         Java
                                                         C
                                                         PMML

                                                         Other languages coming:

                                                         SQL
                                                         Visual Basic




Salford Systems © Copyright 2005-2012                                              26
Dealing with Monotonicity

     Undesired response profile (here we prefer a monotone increasing response)




Salford Systems © Copyright 2005-2012                                              27
Impose Constraint

     Constrained solution generated to yield a monotone increasing smooth




       In this case constraint is mild in that the differences on Y-axis are quite small
Salford Systems © Copyright 2005-2012                                                       28
Interaction Detection

     TreeNet models based on 2-node trees automatically EXCLUDE interactions
          Model may be highly nonlinear but is by definition strictly additive
          Every term in the model is based on a single variable (single split)
          Use this as a baseline model for best possible additive model (automated GAM)

     Build TreeNet on larger tree (default is 6 nodes)
          Permits up to 5-way interaction but in practice is more like 3-way interaction

     Can conduct informal likelihood ratio test TN(2-node) vs TN(6-node)
          Large differences signal important interactions
          In TreeNet 2.0 can locate interactions via 3-D 2-variable dependency plots
          In SPM 6.6 and later variables participating in interactions are ranked using new
           methodology developed by Friedman and Salford Systems




Salford Systems © Copyright 2005-2012                                                          29
Interactions Ranking Report



                                              Interaction strength measure is first
                                              column; 2nd column shows normalized
                                              scores

                                              Variables are listed in descending
                                              order of overall interaction strength




     Computation of interaction strength is based on two ways of creating a 3D plot of
     target vs Xi and Xj
        One method ignores possible interactions and the other explicitly accounts for
     interaction . If the two are very different we have evidence from the model for the
     existence of an interaction
Salford Systems © Copyright 2005-2012                                                      30
What interacts with what?


        Interaction strength scores for 2-way interactions for most important variables




             NMOS                                     PAY_FREQUENCY_2$

     Here we see that NMOS primarily interacts with PAY_FREQUENCY_2$
     All other interactions with NMOS have much smaller impacts

Salford Systems © Copyright 2005-2012                                                      31
Example of an important interaction

     Slope reverses due to interaction
     In this real world example the downward sloping portion of relationship was
      well known. But segment displaying upward sloping relationship was not
      known




      Note that the dominant pattern is downward sloping
      But a key segment defined by the 3rd variable is upward sloping

Salford Systems © Copyright 2005-2012                                               32
Real World Examples

     Corporate default (Probability of Default or PD)
     Analysis of bank ratings




Salford Systems © Copyright 2005-2012                                    33
Corporate Default Scorecard

     Mid-sized bank
          Around 200 defaults
          Around 200 indeterminate/slow

     Classic small sample problem
     Standard financial statements available, variety of ratios created
     All key variables had some fraction of missing data, from 2% to 6%
      in a set of 10 core predictors, and up to 35% missing in a set of 20
      core predictors
     Single CART tree involves just 6 predictors
          yields cross-validated ROC= 0.6523




Salford Systems © Copyright 2005-2012                                        34
TreeNet Analysis of Corporate Default Data: Summary Display

  




       Summary reports progress of model as it evolves with an increasing number of
       trees. Markers indicate models optimizing entropy, misclassification rate, ROC
Salford Systems © Copyright 2005-2012                                                   35
Corporate Default Model: TreeNet

      1789 Trees in model with best test sample area under ROC curve


                                        CXE    Class Error      ROC Area       Lift

      --------------------------------------------------------------------
      Optimal N Trees:                  2000          1930        1789         1911
      Optimal Criterion                 0.534914   0.3485293   0.7164921    1.9231729



      TreeNet Cross-Validated ROC=0.71649 far better than single tree
      Able to make use of more variables due to the many trees
           Single CART tree uses 6 predictors, TreeNet uses more than 20

      Some of these variables were missing for many accounts
      Can extract graphical displays to describe model


Salford Systems © Copyright 2005-2012                                                   36
Predictive Variable – Financial Ratio 1


                       R is k o r P ro b
                        o f D e fa u lt
                                                        Ratio 1
                                                S a le s / C u rre n t A s s e ts
                         0 .0 0 3



                         0 .0 0 2



                         0 .0 0 1


                                           10    20              30                 40   50            60
                         0 .0 0 0

                                                                                              R a tio V a lu e

                        -0 .0 0 1



                        -0 .0 0 2



                        -0 .0 0 3



                        -0 .0 0 4




                          TreeNet can trace out impact of predictor on PD


Salford Systems © Copyright 2005-2012                                                                            37
Predictive Variable – Financial Ratio 2


                                                                   R is k o r P ro b
                                                                    o f D e fa u lt
                                                                                                                   R e tu rn o n A s s e ts (R O A )
                                                                                 0 .0 0 0 6
                                                                                                                               Ratio 2
                                                                                                                     O p e ra tin g P ro fit / T o ta l A s s e t




                                                                                 0 .0 0 0 4




                                                                                 0 .0 0 0 2


                                                                                                                                                 R a tio V a lu e



                -0 .9   -0 .8   -0 .7   -0 .6   -0 .5   -0 .4   -0 .3   -0 .2    -0 .1        0 .1   0 .2   0 .3    0 .4   0 .5    0 .6   0 .7   0 .8    0 .9   1 .0



                                                                                -0 .0 0 0 2




                                                                                -0 .0 0 0 4




Salford Systems © Copyright 2005-2012                                                                                                                                  38
Additive vs Interaction Model (2-node vs 6-node trees)

      2 node trees model                 CXE     Class Error     ROC Area       Lift

      --------------------------------------------------------------------
      Optimal N Trees:                  2000      1812          1978          1763
      Optimal Criterion                 0.54538   0.39331   0.70749      1.84881


      6 node trees model
      Optimal N Trees:                  2000      1930          1789          1911
      Optimal Criterion                 0.53491   0.34852   0.71649          1.92317


      Two-node trees do not permit interactions of any kind as each term in the
       model involves a single predictor in isolation. Two-node tree models can be
       highly nonlinear but strictly additive
      Three- or more node trees allow progressively higher order interactions.
      Running model with different tree sizes allows simple discovery of the precise
       amount of interaction required for maximum performance models
      In this example no strong evidence for low-order interactions
Salford Systems © Copyright 2005-2012                                                   39
Bank Ratings: Regression Example

     Build a predictive model for average of major bank ratings
          Scaled average of S&P. Moody‟s, Fitch Ratings

     Challenges include
          Small data base (66 banks)
          Missing values prevalent (up to 35 missing in any predictor)
          Impossible to build linear regression model because of missings
          Expect relationships to be nonlinear

     66 banks
     25 potential predictors
     Study run in 2003!




Salford Systems © Copyright 2005-2012                                        40
Cross-Validated Performance: Predicting Rating Score


                                   1.4
          CV Mean Absolute Error

                                   1.3
                                   1.2
                                   1.1
                                    1
                                   0.9
                                   0.8
                                   0.7
                                   0.6
                                         1
                                             60
                                                  119
                                                        178
                                                              237
                                                                    296
                                                                          355
                                                                                414
                                                                                      473
                                                                                            532
                                                                                                  591
                                                                                                        650
                                                                                                              709
                                                                                                                    768
                                                                                                                          827
                                                                                                                                886
                                                                                                                                      945
                                                                                Model Size




         The optimal model achieved around 860 trees with cross-validated
         mean absolute error 0.87, target variable ranges from 1 to 10




Salford Systems © Copyright 2005-2012                                                                                                       41
Variable Importance Ranking




       Ranks variables according to their contributions to the model
       Influenced by the number of times a variables is used and is strength
       as a splitter when it is used
Salford Systems © Copyright 2005-2012                                          42
Country Contribution to Risk Score


                                                                    Distribution by Country




                                                      US
                                                      SE
                                                      PT
                                                      NL
                                                      LU
                                                      JP
                                                      IT
                                                      IE
                                                      GR
                                                      GB
                                                      FR
                                                      ES
                                                      DK
                                                      DE
                                                      CH
                                                      CA
                                                      BE
                                                      AT
                                                           0    2              4              6   8




      Swiss banks (CH) tend be rated low risk and Greek and
      Portuguese, Italian, Irish and Japanese banks high risk

Salford Systems © Copyright 2005-2012                                                                 43
Bank Specialization

                                                                         Distribution




                                                            Specialised Governmental Credit Inst.




                                                            Savings Bank




                                                            Real Estate / Mortgage Bank




                                                            Non-banking Credit Institution




                                                            Investment Bank/Securities House




                                                            Cooperative Bank




                                                            Commercial Bank




                                                  0    10               20              30          40   50




    Commercial banks and investment banks are rated higher risk



Salford Systems © Copyright 2005-2012                                                                    44
Scale: Total Deposits


                                                                             Histogram




                                             35
                                             30
                                             25
                                             20
                                             15
                                             10
                                             5
                                             0    0 e+00   1 e+08   2 e+08    3 e+08     4 e+08   5 e+08   6 e+08




           A step function in size of bank


Salford Systems © Copyright 2005-2012                                                                               45
ROAE


                                                             Histogram




                                        40
                                        30
                                        20
                                        10
                                        0
                                             -20   -10   0      10       20   30    40




Salford Systems © Copyright 2005-2012                                                    46
Cost to Income Ratio: Impact on Risk Score


                                                                   Histogram




                                                    20
                                                    15
                                                    10
                                                    5
                                                    0
                                                         20   40   60          80   100




         High cost to income increases risk score

Salford Systems © Copyright 2005-2012                                                     47
Equity to Total Assets


                                                              Histogram




                                            20
                                            15
                                            10
                                            5
                                            0
                                                 0        5               10   15




    Greater risk forecasted when equity is a large share of total assets


Salford Systems © Copyright 2005-2012                                               48
References

     Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
     Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting
      algorithm. In L. Saitta, ed., Machine Learning: Proceedings of the Thirteenth
      National Conference, Morgan Kaufmann, pp. 148-156.
     Friedman, J.H. (1999). Stochastic gradient boosting. Stanford: Statistics
      Department, Stanford University.
     Friedman, J.H. (1999). Greedy function approximation: a gradient boosting
      machine. Stanford: Statistics Department, Stanford University.
     Hastie, T., Tibshirani, R., and Friedman, J.H (2009). The Elements of
      Statistical Learning. Springer. 2nd Edition




Salford Systems © Copyright 2005-2012                                                 49

Mais conteúdo relacionado

Semelhante a TreeNet Overview - Updated October 2012

Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptxssuser5c580e1
 
Random Decision Forests at Scale
Random Decision Forests at ScaleRandom Decision Forests at Scale
Random Decision Forests at ScaleCloudera, Inc.
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...Edge AI and Vision Alliance
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptxPRIYACHAURASIYA25
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_moheyDoaa Mohey Eldin
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning OptimizationNikolas Markou
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyabatubao
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 

Semelhante a TreeNet Overview - Updated October 2012 (20)

Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
 
Random Decision Forests at Scale
Random Decision Forests at ScaleRandom Decision Forests at Scale
Random Decision Forests at Scale
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nyaAlgoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
 
What Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdfWhat Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdf
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 

Mais de Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
 

Mais de Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorial
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

TreeNet Overview - Updated October 2012

  • 1. Overview of TreeNet® Technology Stochastic Gradient Boosting Dan Steinberg 2012 http://www.salford-systems.com
  • 2. Introduction to TreeNet: Stochastic Gradient Boosting  Powerful approach to machine learning and function approximation developed by Jerome H. Friedman at Stanford University  Friedman is a coauthor of CART® with Breiman, Olshen and Stone  Author of MARS®, PRIM, Projection Pursuit, COSA, RuleFit and more  TreeNet is unusual among learning machines in being very strong for both classification and regression problems  Stochastic gradient boosting is now widely regarded as the most powerful methodology available for most predictive modeling contexts  Builds on the notions of committees of experts and boosting but is substantially different in key implementation details Salford Systems © Copyright 2005-2012 2
  • 3. Aspects of TreeNet  Built on CART trees and thus  immune to outliers  handles missing values automatically  results invariant under order preserving transformations of variables  No need to ever consider functional form revision (log, sqrt, power)  Highly discriminatory variable selector  Effective with thousands of predictors  Detects and identifies important interactions  Can be used to easily test for presence of interactions and their degree  Resistant to overtraining – generalizes well  Can be remarkably accurate with little effort  Should easily outperform conventional models Salford Systems © Copyright 2005-2012 3
  • 4. Adapting to Major Errors in Data  TreeNet is a machine learning technology designed to recognize patterns in historical data  Ideally the data TreeNet will use to learn from will be accurate  In some circumstances there is a risk that the most important variable, namely the dependent variable is subject to error. Known as “mislabelled data”  Good examples of mislabeled data can be found in  Medical diagnoses  Insurance claim fraud  Thus historical data for “not frauds” actually includes undetected fraud  Some of the “0”s are actually “1”s which complicates learning (possibly fatal)  TreeNet manages such data successfully Salford Systems © Copyright 2005-2012 4
  • 5. Some TreeNet Successes  2010 DMA Direct Marketing Association. 1st place*  2009 KDDCup IDAnalytics*, and FEG Japan* 1st Runner Up  2008 DMA Direct Marketing Association 1st Runner Up  2007 Pacific Asia PAKDD: Credit Card Cross Sell. 1st place  2007 DMA Direct Marketing Association 1st place  2006 PAKDD Pacific Asia KDD: Telco Customer Model 2nd Runner Up  2005 BI-Cup Latin America: Predictive E-commerce* 1st place  2004 KDDCup: Predictive Modeling „Most Accurate”*  2002 NCR/Teradata Duke University: Predictive Modeling-Churn  Four separate predictive modeling challenges 1st place * Won by Salford client or partner using TreeNet Salford Systems © Copyright 2005-2012 5
  • 6. Multi-tree methods and their single tree ancestors  Multi-tree methods have been under development since the early 1990s. Most important variants (and dates of published articles) are:  Bagger (Breiman, 1996, “Bootstrap Aggregation”)  Boosting (Freund and Schapire, 1995)  Multiple Additive Regression Trees (Friedman, 1999, aka MART™ or TreeNet®)  RandomForests® (Breiman, 2001)  TreeNet version 1.0 was released in 2002 but its power took some time to be recognized  Work on TreeNet continues with major refinements underway (Friedman in collaboration with Salford Systems) Salford Systems © Copyright 2005-2012 6
  • 7. Multi-tree Methods: Simplest Case  Simplest example:  Grow a tree on training data  Find a way to grow another different tree (change something in set up)  Repeat many times, eg 500 replications  Average results or create voting scheme. Eg. relate Probability of Event to fraction of trees predicting event for a given record  Beauty of the method is that every new tree starts with a Prediction complete set of data. Via Voting  Any one tree can run out of data, but when that happens we just start again with a new tree and all the data (before sampling) Salford Systems © Copyright 2005-2012 7
  • 8. Automated Multiple Tree Generation  Earliest multi-model methods recommended taking several good candidates and averaging them. Examples considered as few as three trees.  Too difficult to generate multiple models manually. Hard enough to get one good model.  How do we generate different trees?  Bagger: random reweighting of the data via bootstrap resampling  Bootstrap resampling allows for zero and positive integer weights  RandomForests: Random splits applied to randomly reweighted data  Implies that tree itself is grown at least partly at random  Boosting: Reweighting data based on prior success in correctly classifying a case. High weights on difficult to classifiy cases. (Systematic reweighting)  TreeNet: Boosting with major refinements. Each tree attempts to correct errors made by predecessors  Each tree is linked to predecessors. Like a series expansion where the addition of terms progressively improves the predictions Salford Systems © Copyright 2005-2012 8
  • 9. TreeNet  We focus on TreeNet because  It is the method used in many successful real world studies  Stochastic gradient boosting arguably the most popular methodology among experience data miners  We have found it to be consistently more accurate than the other methods  Major credit card fraud detection engine uses TreeNet  Widespread use in on-line e-commerce  Dramatic capabilities include:  Display impact of any predictor (dependency plots)  Test for existence of interactions  Identify and rank interactions  Constrain model: allow some interactions and disallow others  Tools to recast TreeNet model as a logistic regression or GLM Salford Systems © Copyright 2005-2012 9
  • 10. TreeNet Process  Begin with one very small tree as initial model  Could be as small as ONE split generating 2 terminal nodes  Default model will have 4-6 terminal nodes  Final output is a predicted target (regression) or predicted probability  First tree is an intentionally “weak” model  Compute “residuals” for this simple model (prediction error) for every record in data (even for classification model)  Grow second small tree to predict the residuals from first tree  New model is now: Tree1 + Tree2  Compute residuals from this new 2-tree model and grow 3rd tree to predict revised residuals Salford Systems © Copyright 2005-2012 10
  • 11. TreeNet: Trees incrementally revise predicted scores Tree 1 Tree 2 Tree 3 + + First tree grown 2nd tree grown on 3rd tree grown on on original target. residuals from first. residuals from Intentionally Predictions made to model consisting of “weak” model improve first tree first two trees Every tree produces at least one positive and at least one negative node. Red reflects a relatively large positive and deep blue reflects a relatively negative node. Total “score” for a given record is obtained by finding relevant terminal node in every tree in model and summing across all trees Salford Systems © Copyright 2005-2012 11
  • 12. TreeNet: Example Individual Trees Intercept 0.172 YES - 0.228 COST2INC < 65.4 Tree 1 - 0.051 EQ2CUST_STF < 5.04 NO Tree 1 + 0.068 + 0.065 TOTAL_DEPS < 65.5M - 0.213 Tree 2 EQ2TOT_AST < 2.8 Tree 2 - 0.010 - 0.088 EQ2CUST_STF < 5.04 + 0.140 Tree 3 EQ2TOT_AST < 2.8 Tree 3 - 0.007 Predicted Other Trees Response Salford Systems © Copyright 2005-2012 12
  • 13. TreeNet Methdology: Key points  Trees are kept small (each tree learns only a little)  Each tree is weighted by a small weight prior to being used to compute the current prediction or score produced by the model.  Like a partial adjustment model. Update factors can be as small as .01, .001, .0001. This means that the model prediction changes by very small amounts after a new tree is added to the model (training cycle)  Use random subsets of the training data for each new tree. Never train on all the training data in any one learning cycle  Highly problematic cases are optionally IGNORED. If model prediction starts to diverge substantially from observed data for some records, those records may not be used in further model updates  Model can be tuned to optimize any of several criteria:  Least Squares, Least Absolute Deviation, Huber-M Hybrid LS/LAD  Area under the ROC curve  Logistic likelihood (deviance)  Classification Accuracy Salford Systems © Copyright 2005-2012 13
  • 14. Why does TreeNet work?  Slow learning: the method “peels the onion” extracting very small amounts of information in any one learning cycle  TreeNet can leverage hints available in the data across a number of predictors  TreeNet can successfully include more variables than traditional models  Can capture substantial nonlinearity and complex interactions of high degree  TreeNet self-protects against errors in the dependent variable  If a record is actually a “1” but is misrecorded in the data as a “0” and TreeNet recognizes it as a “1” TreeNet may decide to ignore this record  Important for medical studies in which the diagnosis can be incorrect  “Conversion” in online behavior can be misrecorded because it is not recognized (e.g. conversion occurs via another channel or is delayed) Salford Systems © Copyright 2005-2012 14
  • 15. Multiple Additive Regression Trees  Friedman originally named his methodology MART™ as the method generates small trees which are summed to obtain an overall score  The model can be thought of as a series expansion approximating the true functional relationship  We can think of each small tree as a mini-scorecard, making use of possibly different combinations of variables. Each mini-scorecard is designed to offer a slight improvement by correcting and refining its predecessors  Because each tree starts at the “root node” and can use all of the available data a TreeNet model can never run out of data no matter how many trees are built.  The number of trees required to obtain best possible performance can vary considerably from data set to data set  Some models” converge” after just a handful of trees and other may require many thousands Salford Systems © Copyright 2005-2012 15
  • 16. Selecting Optimal Model  TreeNet first grows a large number of trees  We evaluate performance of the ensemble as each tree is added:  Start with 1 tree. Then go on to 2 trees (1st + 2nd ).  Then 3 trees. (1st + 2nd + 3rd ) Etc.  Criteria used right now are  Least squares, Least Absolute Deviation, or Huber-M for regression  Classification Accuracy  Log-Likelihood  ROC (area under curve)  Lift in top P percentile (often top decile) Salford Systems © Copyright 2005-2012 16
  • 17. TreeNet Summary Screen Displaying ROC (Y-axis) on train and test samples at all ensemble sizes Each criterion is optimized using a different number of trees (size of ensemble)  Criterion CXE Class Error ROC Area Lift  ----------------------------------------------------------------------------  Optimal Number of Trees: 739 1069 643 163  Optimal Criterion 0.4224394 0.1803279 0.8862029 1.9626555 Salford Systems © Copyright 2005-2012 17
  • 18. How the different criteria select different response profiles Artificial data: Red curve is truth Logistic Models fit to artificial data CXE (RED) Orange is logistic Classification Accuracy regression Green is TreeNet Optimized for Classification Accuracy Blue is TreeNet optimized for max log likelihood (CXE) TreeNet (CXE) tracks data far more faithfully than LOGIT Salford Systems © Copyright 2005-2012 18
  • 19. Interpreting TN Models  As TN models consist of hundreds or even thousands of trees there is no way to represent the model via a display of one or two trees  However, the model can be summarized in a variety of ways  Partial dependency plots: These exhibit the relationship between the target and any predictor – as captured by the model.  Variable Importance Rankings: These stable rankings give an excellent assessment of the relative importance of predictors  For regression models SPM 7.0 displays box plots for residuals and residual outlier diagnostics  For binary classification TN also displays:  ROC curves: TN models produce scores that are typically unique for each scored record allowing records to be ranked from best to worst. The ROC curve and area under the curve reveal how successful the ranking is.  Confusion Matrix: Using an adjustable score threshold this matrix displays the model false positive and false negative rates. Salford Systems © Copyright 2005-2012 19
  • 20. TN Summary: Variable Importance Ranking Based on actual use of variables in the trees on training data TreeNet creates a sum of split improvement scores for every variable in every tree Salford Systems © Copyright 2005-2012 20
  • 21. TN Classification Accuracy: Test Data Threshold can be adjusted to reflect unbalanced classes, rare events Salford Systems © Copyright 2005-2012 21
  • 22. TreeNet: Partial Dependency Plot Y-axis: One-half-log-odds Salford Systems © Copyright 2005-2012 22
  • 23. Partial Dependency Plot Explained  Partial dependency plots are extracted from the TreeNet model via simulation  To trace out the impact of the predictor X on the target Y we start with a single record from the training data and generate a set of predictions for the target Y for each of a large number of different values of X  We repeat this process for every record in the training data and then average the results to produce the displayed partial dependency plot  In this way the plot reflects the effects of all the other relevant predictors  Important to remember that the plot is based on predictions generated by the model  Smooths of plots might be used to create transforms of certain variables for use in GLMs or logistic regressions Salford Systems © Copyright 2005-2012 23
  • 24. Place Knot Locations on Graph for Smooth Smooth can be created on screen within TreeNet Salford Systems © Copyright 2005-2012 24
  • 25. Smooth Depicted in Green Salford Systems © Copyright 2005-2012 25
  • 26. Generate Programming Code for Smooth  Can obtain SAS code Java C PMML Other languages coming: SQL Visual Basic Salford Systems © Copyright 2005-2012 26
  • 27. Dealing with Monotonicity  Undesired response profile (here we prefer a monotone increasing response) Salford Systems © Copyright 2005-2012 27
  • 28. Impose Constraint  Constrained solution generated to yield a monotone increasing smooth In this case constraint is mild in that the differences on Y-axis are quite small Salford Systems © Copyright 2005-2012 28
  • 29. Interaction Detection  TreeNet models based on 2-node trees automatically EXCLUDE interactions  Model may be highly nonlinear but is by definition strictly additive  Every term in the model is based on a single variable (single split)  Use this as a baseline model for best possible additive model (automated GAM)  Build TreeNet on larger tree (default is 6 nodes)  Permits up to 5-way interaction but in practice is more like 3-way interaction  Can conduct informal likelihood ratio test TN(2-node) vs TN(6-node)  Large differences signal important interactions  In TreeNet 2.0 can locate interactions via 3-D 2-variable dependency plots  In SPM 6.6 and later variables participating in interactions are ranked using new methodology developed by Friedman and Salford Systems Salford Systems © Copyright 2005-2012 29
  • 30. Interactions Ranking Report Interaction strength measure is first column; 2nd column shows normalized scores Variables are listed in descending order of overall interaction strength Computation of interaction strength is based on two ways of creating a 3D plot of target vs Xi and Xj One method ignores possible interactions and the other explicitly accounts for interaction . If the two are very different we have evidence from the model for the existence of an interaction Salford Systems © Copyright 2005-2012 30
  • 31. What interacts with what? Interaction strength scores for 2-way interactions for most important variables NMOS PAY_FREQUENCY_2$ Here we see that NMOS primarily interacts with PAY_FREQUENCY_2$ All other interactions with NMOS have much smaller impacts Salford Systems © Copyright 2005-2012 31
  • 32. Example of an important interaction  Slope reverses due to interaction  In this real world example the downward sloping portion of relationship was well known. But segment displaying upward sloping relationship was not known Note that the dominant pattern is downward sloping But a key segment defined by the 3rd variable is upward sloping Salford Systems © Copyright 2005-2012 32
  • 33. Real World Examples  Corporate default (Probability of Default or PD)  Analysis of bank ratings Salford Systems © Copyright 2005-2012 33
  • 34. Corporate Default Scorecard  Mid-sized bank  Around 200 defaults  Around 200 indeterminate/slow  Classic small sample problem  Standard financial statements available, variety of ratios created  All key variables had some fraction of missing data, from 2% to 6% in a set of 10 core predictors, and up to 35% missing in a set of 20 core predictors  Single CART tree involves just 6 predictors  yields cross-validated ROC= 0.6523 Salford Systems © Copyright 2005-2012 34
  • 35. TreeNet Analysis of Corporate Default Data: Summary Display  Summary reports progress of model as it evolves with an increasing number of trees. Markers indicate models optimizing entropy, misclassification rate, ROC Salford Systems © Copyright 2005-2012 35
  • 36. Corporate Default Model: TreeNet  1789 Trees in model with best test sample area under ROC curve CXE Class Error ROC Area Lift -------------------------------------------------------------------- Optimal N Trees: 2000 1930 1789 1911 Optimal Criterion 0.534914 0.3485293 0.7164921 1.9231729  TreeNet Cross-Validated ROC=0.71649 far better than single tree  Able to make use of more variables due to the many trees  Single CART tree uses 6 predictors, TreeNet uses more than 20  Some of these variables were missing for many accounts  Can extract graphical displays to describe model Salford Systems © Copyright 2005-2012 36
  • 37. Predictive Variable – Financial Ratio 1 R is k o r P ro b o f D e fa u lt Ratio 1 S a le s / C u rre n t A s s e ts 0 .0 0 3 0 .0 0 2 0 .0 0 1 10 20 30 40 50 60 0 .0 0 0 R a tio V a lu e -0 .0 0 1 -0 .0 0 2 -0 .0 0 3 -0 .0 0 4 TreeNet can trace out impact of predictor on PD Salford Systems © Copyright 2005-2012 37
  • 38. Predictive Variable – Financial Ratio 2 R is k o r P ro b o f D e fa u lt R e tu rn o n A s s e ts (R O A ) 0 .0 0 0 6 Ratio 2 O p e ra tin g P ro fit / T o ta l A s s e t 0 .0 0 0 4 0 .0 0 0 2 R a tio V a lu e -0 .9 -0 .8 -0 .7 -0 .6 -0 .5 -0 .4 -0 .3 -0 .2 -0 .1 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 -0 .0 0 0 2 -0 .0 0 0 4 Salford Systems © Copyright 2005-2012 38
  • 39. Additive vs Interaction Model (2-node vs 6-node trees)  2 node trees model CXE Class Error ROC Area Lift -------------------------------------------------------------------- Optimal N Trees: 2000 1812 1978 1763 Optimal Criterion 0.54538 0.39331 0.70749 1.84881  6 node trees model Optimal N Trees: 2000 1930 1789 1911 Optimal Criterion 0.53491 0.34852 0.71649 1.92317  Two-node trees do not permit interactions of any kind as each term in the model involves a single predictor in isolation. Two-node tree models can be highly nonlinear but strictly additive  Three- or more node trees allow progressively higher order interactions.  Running model with different tree sizes allows simple discovery of the precise amount of interaction required for maximum performance models  In this example no strong evidence for low-order interactions Salford Systems © Copyright 2005-2012 39
  • 40. Bank Ratings: Regression Example  Build a predictive model for average of major bank ratings  Scaled average of S&P. Moody‟s, Fitch Ratings  Challenges include  Small data base (66 banks)  Missing values prevalent (up to 35 missing in any predictor)  Impossible to build linear regression model because of missings  Expect relationships to be nonlinear  66 banks  25 potential predictors  Study run in 2003! Salford Systems © Copyright 2005-2012 40
  • 41. Cross-Validated Performance: Predicting Rating Score 1.4 CV Mean Absolute Error 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 1 60 119 178 237 296 355 414 473 532 591 650 709 768 827 886 945 Model Size The optimal model achieved around 860 trees with cross-validated mean absolute error 0.87, target variable ranges from 1 to 10 Salford Systems © Copyright 2005-2012 41
  • 42. Variable Importance Ranking Ranks variables according to their contributions to the model Influenced by the number of times a variables is used and is strength as a splitter when it is used Salford Systems © Copyright 2005-2012 42
  • 43. Country Contribution to Risk Score Distribution by Country US SE PT NL LU JP IT IE GR GB FR ES DK DE CH CA BE AT 0 2 4 6 8 Swiss banks (CH) tend be rated low risk and Greek and Portuguese, Italian, Irish and Japanese banks high risk Salford Systems © Copyright 2005-2012 43
  • 44. Bank Specialization Distribution Specialised Governmental Credit Inst. Savings Bank Real Estate / Mortgage Bank Non-banking Credit Institution Investment Bank/Securities House Cooperative Bank Commercial Bank 0 10 20 30 40 50  Commercial banks and investment banks are rated higher risk Salford Systems © Copyright 2005-2012 44
  • 45. Scale: Total Deposits Histogram 35 30 25 20 15 10 5 0 0 e+00 1 e+08 2 e+08 3 e+08 4 e+08 5 e+08 6 e+08 A step function in size of bank Salford Systems © Copyright 2005-2012 45
  • 46. ROAE Histogram 40 30 20 10 0 -20 -10 0 10 20 30 40 Salford Systems © Copyright 2005-2012 46
  • 47. Cost to Income Ratio: Impact on Risk Score Histogram 20 15 10 5 0 20 40 60 80 100 High cost to income increases risk score Salford Systems © Copyright 2005-2012 47
  • 48. Equity to Total Assets Histogram 20 15 10 5 0 0 5 10 15 Greater risk forecasted when equity is a large share of total assets Salford Systems © Copyright 2005-2012 48
  • 49. References  Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.  Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In L. Saitta, ed., Machine Learning: Proceedings of the Thirteenth National Conference, Morgan Kaufmann, pp. 148-156.  Friedman, J.H. (1999). Stochastic gradient boosting. Stanford: Statistics Department, Stanford University.  Friedman, J.H. (1999). Greedy function approximation: a gradient boosting machine. Stanford: Statistics Department, Stanford University.  Hastie, T., Tibshirani, R., and Friedman, J.H (2009). The Elements of Statistical Learning. Springer. 2nd Edition Salford Systems © Copyright 2005-2012 49