Enviar pesquisa
Carregar
TreeNet Overview - Updated October 2012
•
Transferir como PPTX, PDF
•
3 gostaram
•
2,650 visualizações
Salford Systems
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 49
Baixar agora
Recomendados
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
Salford Systems
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
Decision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
Some of the new features in SPM 7
Some of the new features in SPM 7
Salford Systems
10 best practices in operational analytics
10 best practices in operational analytics
Decision Management Solutions
Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
Recomendados
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
Salford Systems
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
Decision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
Some of the new features in SPM 7
Some of the new features in SPM 7
Salford Systems
10 best practices in operational analytics
10 best practices in operational analytics
Decision Management Solutions
Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
Lt. 5 Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
Random Decision Forests at Scale
Random Decision Forests at Scale
Cloudera, Inc.
Data Mining In Market Research
Data Mining In Market Research
jim
Data Mining in Market Research
Data Mining in Market Research
butest
Data Mining In Market Research
Data Mining In Market Research
kevinlan
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
Edge AI and Vision Alliance
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Alexey Rybakov
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
PRIYACHAURASIYA25
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
Machine learning session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
Data science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
Doaa Mohey Eldin
Electi Deep Learning Optimization
Electi Deep Learning Optimization
Nikolas Markou
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
batubao
What Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdf
Dr Arash Najmaei ( Phd., MBA, BSc)
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
Deep neural networks and tabular data
Deep neural networks and tabular data
JimmyLiang20
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
Datascience101presentation4
Datascience101presentation4
Salford Systems
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Salford Systems
Mais conteúdo relacionado
Semelhante a TreeNet Overview - Updated October 2012
Lt. 5 Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
Random Decision Forests at Scale
Random Decision Forests at Scale
Cloudera, Inc.
Data Mining In Market Research
Data Mining In Market Research
jim
Data Mining in Market Research
Data Mining in Market Research
butest
Data Mining In Market Research
Data Mining In Market Research
kevinlan
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
Edge AI and Vision Alliance
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Alexey Rybakov
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
PRIYACHAURASIYA25
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
Machine learning session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
Data science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
Doaa Mohey Eldin
Electi Deep Learning Optimization
Electi Deep Learning Optimization
Nikolas Markou
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
batubao
What Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdf
Dr Arash Najmaei ( Phd., MBA, BSc)
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
Deep neural networks and tabular data
Deep neural networks and tabular data
JimmyLiang20
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
Semelhante a TreeNet Overview - Updated October 2012
(20)
Lt. 5 Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
Random Decision Forests at Scale
Random Decision Forests at Scale
Data Mining In Market Research
Data Mining In Market Research
Data Mining in Market Research
Data Mining in Market Research
Data Mining In Market Research
Data Mining In Market Research
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Machine learning session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Data science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
Electi Deep Learning Optimization
Electi Deep Learning Optimization
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Algoritma Random Forest beserta aplikasi nya
Algoritma Random Forest beserta aplikasi nya
What Is Random Forest_ analytics_ IBM.pdf
What Is Random Forest_ analytics_ IBM.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Deep neural networks and tabular data
Deep neural networks and tabular data
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Mais de Salford Systems
Datascience101presentation4
Datascience101presentation4
Salford Systems
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Salford Systems
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Salford Systems
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
Statistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Salford Systems
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Salford Systems
Evolution of regression ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
Data Mining for Higher Education
Data Mining for Higher Education
Salford Systems
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
Molecular data mining tool advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
SPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
Salford Systems
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Salford Systems
Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
Session Logs Tutorial for SPM
Session Logs Tutorial for SPM
Salford Systems
Text mining tutorial
Text mining tutorial
Salford Systems
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Salford Systems
Mais de Salford Systems
(20)
Datascience101presentation4
Datascience101presentation4
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Statistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
Evolution of regression ols to gps to mars
Evolution of regression ols to gps to mars
Data Mining for Higher Education
Data Mining for Higher Education
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Molecular data mining tool advances in hiv
Molecular data mining tool advances in hiv
SPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
Hybrid cart logit model 1998
Hybrid cart logit model 1998
Session Logs Tutorial for SPM
Session Logs Tutorial for SPM
Text mining tutorial
Text mining tutorial
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
Último
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Nicole Novielli
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Último
(20)
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
How to write a Business Continuity Plan
How to write a Business Continuity Plan
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
TreeNet Overview - Updated October 2012
1.
Overview of TreeNet®
Technology Stochastic Gradient Boosting Dan Steinberg 2012 http://www.salford-systems.com
2.
Introduction to TreeNet:
Stochastic Gradient Boosting Powerful approach to machine learning and function approximation developed by Jerome H. Friedman at Stanford University Friedman is a coauthor of CART® with Breiman, Olshen and Stone Author of MARS®, PRIM, Projection Pursuit, COSA, RuleFit and more TreeNet is unusual among learning machines in being very strong for both classification and regression problems Stochastic gradient boosting is now widely regarded as the most powerful methodology available for most predictive modeling contexts Builds on the notions of committees of experts and boosting but is substantially different in key implementation details Salford Systems © Copyright 2005-2012 2
3.
Aspects of TreeNet
Built on CART trees and thus immune to outliers handles missing values automatically results invariant under order preserving transformations of variables No need to ever consider functional form revision (log, sqrt, power) Highly discriminatory variable selector Effective with thousands of predictors Detects and identifies important interactions Can be used to easily test for presence of interactions and their degree Resistant to overtraining – generalizes well Can be remarkably accurate with little effort Should easily outperform conventional models Salford Systems © Copyright 2005-2012 3
4.
Adapting to Major
Errors in Data TreeNet is a machine learning technology designed to recognize patterns in historical data Ideally the data TreeNet will use to learn from will be accurate In some circumstances there is a risk that the most important variable, namely the dependent variable is subject to error. Known as “mislabelled data” Good examples of mislabeled data can be found in Medical diagnoses Insurance claim fraud Thus historical data for “not frauds” actually includes undetected fraud Some of the “0”s are actually “1”s which complicates learning (possibly fatal) TreeNet manages such data successfully Salford Systems © Copyright 2005-2012 4
5.
Some TreeNet Successes
2010 DMA Direct Marketing Association. 1st place* 2009 KDDCup IDAnalytics*, and FEG Japan* 1st Runner Up 2008 DMA Direct Marketing Association 1st Runner Up 2007 Pacific Asia PAKDD: Credit Card Cross Sell. 1st place 2007 DMA Direct Marketing Association 1st place 2006 PAKDD Pacific Asia KDD: Telco Customer Model 2nd Runner Up 2005 BI-Cup Latin America: Predictive E-commerce* 1st place 2004 KDDCup: Predictive Modeling „Most Accurate”* 2002 NCR/Teradata Duke University: Predictive Modeling-Churn Four separate predictive modeling challenges 1st place * Won by Salford client or partner using TreeNet Salford Systems © Copyright 2005-2012 5
6.
Multi-tree methods and
their single tree ancestors Multi-tree methods have been under development since the early 1990s. Most important variants (and dates of published articles) are: Bagger (Breiman, 1996, “Bootstrap Aggregation”) Boosting (Freund and Schapire, 1995) Multiple Additive Regression Trees (Friedman, 1999, aka MART™ or TreeNet®) RandomForests® (Breiman, 2001) TreeNet version 1.0 was released in 2002 but its power took some time to be recognized Work on TreeNet continues with major refinements underway (Friedman in collaboration with Salford Systems) Salford Systems © Copyright 2005-2012 6
7.
Multi-tree Methods: Simplest
Case Simplest example: Grow a tree on training data Find a way to grow another different tree (change something in set up) Repeat many times, eg 500 replications Average results or create voting scheme. Eg. relate Probability of Event to fraction of trees predicting event for a given record Beauty of the method is that every new tree starts with a Prediction complete set of data. Via Voting Any one tree can run out of data, but when that happens we just start again with a new tree and all the data (before sampling) Salford Systems © Copyright 2005-2012 7
8.
Automated Multiple Tree
Generation Earliest multi-model methods recommended taking several good candidates and averaging them. Examples considered as few as three trees. Too difficult to generate multiple models manually. Hard enough to get one good model. How do we generate different trees? Bagger: random reweighting of the data via bootstrap resampling Bootstrap resampling allows for zero and positive integer weights RandomForests: Random splits applied to randomly reweighted data Implies that tree itself is grown at least partly at random Boosting: Reweighting data based on prior success in correctly classifying a case. High weights on difficult to classifiy cases. (Systematic reweighting) TreeNet: Boosting with major refinements. Each tree attempts to correct errors made by predecessors Each tree is linked to predecessors. Like a series expansion where the addition of terms progressively improves the predictions Salford Systems © Copyright 2005-2012 8
9.
TreeNet
We focus on TreeNet because It is the method used in many successful real world studies Stochastic gradient boosting arguably the most popular methodology among experience data miners We have found it to be consistently more accurate than the other methods Major credit card fraud detection engine uses TreeNet Widespread use in on-line e-commerce Dramatic capabilities include: Display impact of any predictor (dependency plots) Test for existence of interactions Identify and rank interactions Constrain model: allow some interactions and disallow others Tools to recast TreeNet model as a logistic regression or GLM Salford Systems © Copyright 2005-2012 9
10.
TreeNet Process
Begin with one very small tree as initial model Could be as small as ONE split generating 2 terminal nodes Default model will have 4-6 terminal nodes Final output is a predicted target (regression) or predicted probability First tree is an intentionally “weak” model Compute “residuals” for this simple model (prediction error) for every record in data (even for classification model) Grow second small tree to predict the residuals from first tree New model is now: Tree1 + Tree2 Compute residuals from this new 2-tree model and grow 3rd tree to predict revised residuals Salford Systems © Copyright 2005-2012 10
11.
TreeNet: Trees incrementally
revise predicted scores Tree 1 Tree 2 Tree 3 + + First tree grown 2nd tree grown on 3rd tree grown on on original target. residuals from first. residuals from Intentionally Predictions made to model consisting of “weak” model improve first tree first two trees Every tree produces at least one positive and at least one negative node. Red reflects a relatively large positive and deep blue reflects a relatively negative node. Total “score” for a given record is obtained by finding relevant terminal node in every tree in model and summing across all trees Salford Systems © Copyright 2005-2012 11
12.
TreeNet: Example Individual
Trees Intercept 0.172 YES - 0.228 COST2INC < 65.4 Tree 1 - 0.051 EQ2CUST_STF < 5.04 NO Tree 1 + 0.068 + 0.065 TOTAL_DEPS < 65.5M - 0.213 Tree 2 EQ2TOT_AST < 2.8 Tree 2 - 0.010 - 0.088 EQ2CUST_STF < 5.04 + 0.140 Tree 3 EQ2TOT_AST < 2.8 Tree 3 - 0.007 Predicted Other Trees Response Salford Systems © Copyright 2005-2012 12
13.
TreeNet Methdology: Key
points Trees are kept small (each tree learns only a little) Each tree is weighted by a small weight prior to being used to compute the current prediction or score produced by the model. Like a partial adjustment model. Update factors can be as small as .01, .001, .0001. This means that the model prediction changes by very small amounts after a new tree is added to the model (training cycle) Use random subsets of the training data for each new tree. Never train on all the training data in any one learning cycle Highly problematic cases are optionally IGNORED. If model prediction starts to diverge substantially from observed data for some records, those records may not be used in further model updates Model can be tuned to optimize any of several criteria: Least Squares, Least Absolute Deviation, Huber-M Hybrid LS/LAD Area under the ROC curve Logistic likelihood (deviance) Classification Accuracy Salford Systems © Copyright 2005-2012 13
14.
Why does TreeNet
work? Slow learning: the method “peels the onion” extracting very small amounts of information in any one learning cycle TreeNet can leverage hints available in the data across a number of predictors TreeNet can successfully include more variables than traditional models Can capture substantial nonlinearity and complex interactions of high degree TreeNet self-protects against errors in the dependent variable If a record is actually a “1” but is misrecorded in the data as a “0” and TreeNet recognizes it as a “1” TreeNet may decide to ignore this record Important for medical studies in which the diagnosis can be incorrect “Conversion” in online behavior can be misrecorded because it is not recognized (e.g. conversion occurs via another channel or is delayed) Salford Systems © Copyright 2005-2012 14
15.
Multiple Additive Regression
Trees Friedman originally named his methodology MART™ as the method generates small trees which are summed to obtain an overall score The model can be thought of as a series expansion approximating the true functional relationship We can think of each small tree as a mini-scorecard, making use of possibly different combinations of variables. Each mini-scorecard is designed to offer a slight improvement by correcting and refining its predecessors Because each tree starts at the “root node” and can use all of the available data a TreeNet model can never run out of data no matter how many trees are built. The number of trees required to obtain best possible performance can vary considerably from data set to data set Some models” converge” after just a handful of trees and other may require many thousands Salford Systems © Copyright 2005-2012 15
16.
Selecting Optimal Model
TreeNet first grows a large number of trees We evaluate performance of the ensemble as each tree is added: Start with 1 tree. Then go on to 2 trees (1st + 2nd ). Then 3 trees. (1st + 2nd + 3rd ) Etc. Criteria used right now are Least squares, Least Absolute Deviation, or Huber-M for regression Classification Accuracy Log-Likelihood ROC (area under curve) Lift in top P percentile (often top decile) Salford Systems © Copyright 2005-2012 16
17.
TreeNet Summary Screen
Displaying ROC (Y-axis) on train and test samples at all ensemble sizes Each criterion is optimized using a different number of trees (size of ensemble) Criterion CXE Class Error ROC Area Lift ---------------------------------------------------------------------------- Optimal Number of Trees: 739 1069 643 163 Optimal Criterion 0.4224394 0.1803279 0.8862029 1.9626555 Salford Systems © Copyright 2005-2012 17
18.
How the different
criteria select different response profiles Artificial data: Red curve is truth Logistic Models fit to artificial data CXE (RED) Orange is logistic Classification Accuracy regression Green is TreeNet Optimized for Classification Accuracy Blue is TreeNet optimized for max log likelihood (CXE) TreeNet (CXE) tracks data far more faithfully than LOGIT Salford Systems © Copyright 2005-2012 18
19.
Interpreting TN Models
As TN models consist of hundreds or even thousands of trees there is no way to represent the model via a display of one or two trees However, the model can be summarized in a variety of ways Partial dependency plots: These exhibit the relationship between the target and any predictor – as captured by the model. Variable Importance Rankings: These stable rankings give an excellent assessment of the relative importance of predictors For regression models SPM 7.0 displays box plots for residuals and residual outlier diagnostics For binary classification TN also displays: ROC curves: TN models produce scores that are typically unique for each scored record allowing records to be ranked from best to worst. The ROC curve and area under the curve reveal how successful the ranking is. Confusion Matrix: Using an adjustable score threshold this matrix displays the model false positive and false negative rates. Salford Systems © Copyright 2005-2012 19
20.
TN Summary: Variable
Importance Ranking Based on actual use of variables in the trees on training data TreeNet creates a sum of split improvement scores for every variable in every tree Salford Systems © Copyright 2005-2012 20
21.
TN Classification Accuracy:
Test Data Threshold can be adjusted to reflect unbalanced classes, rare events Salford Systems © Copyright 2005-2012 21
22.
TreeNet: Partial Dependency
Plot Y-axis: One-half-log-odds Salford Systems © Copyright 2005-2012 22
23.
Partial Dependency Plot
Explained Partial dependency plots are extracted from the TreeNet model via simulation To trace out the impact of the predictor X on the target Y we start with a single record from the training data and generate a set of predictions for the target Y for each of a large number of different values of X We repeat this process for every record in the training data and then average the results to produce the displayed partial dependency plot In this way the plot reflects the effects of all the other relevant predictors Important to remember that the plot is based on predictions generated by the model Smooths of plots might be used to create transforms of certain variables for use in GLMs or logistic regressions Salford Systems © Copyright 2005-2012 23
24.
Place Knot Locations
on Graph for Smooth Smooth can be created on screen within TreeNet Salford Systems © Copyright 2005-2012 24
25.
Smooth Depicted in
Green Salford Systems © Copyright 2005-2012 25
26.
Generate Programming Code
for Smooth Can obtain SAS code Java C PMML Other languages coming: SQL Visual Basic Salford Systems © Copyright 2005-2012 26
27.
Dealing with Monotonicity
Undesired response profile (here we prefer a monotone increasing response) Salford Systems © Copyright 2005-2012 27
28.
Impose Constraint
Constrained solution generated to yield a monotone increasing smooth In this case constraint is mild in that the differences on Y-axis are quite small Salford Systems © Copyright 2005-2012 28
29.
Interaction Detection
TreeNet models based on 2-node trees automatically EXCLUDE interactions Model may be highly nonlinear but is by definition strictly additive Every term in the model is based on a single variable (single split) Use this as a baseline model for best possible additive model (automated GAM) Build TreeNet on larger tree (default is 6 nodes) Permits up to 5-way interaction but in practice is more like 3-way interaction Can conduct informal likelihood ratio test TN(2-node) vs TN(6-node) Large differences signal important interactions In TreeNet 2.0 can locate interactions via 3-D 2-variable dependency plots In SPM 6.6 and later variables participating in interactions are ranked using new methodology developed by Friedman and Salford Systems Salford Systems © Copyright 2005-2012 29
30.
Interactions Ranking Report
Interaction strength measure is first column; 2nd column shows normalized scores Variables are listed in descending order of overall interaction strength Computation of interaction strength is based on two ways of creating a 3D plot of target vs Xi and Xj One method ignores possible interactions and the other explicitly accounts for interaction . If the two are very different we have evidence from the model for the existence of an interaction Salford Systems © Copyright 2005-2012 30
31.
What interacts with
what? Interaction strength scores for 2-way interactions for most important variables NMOS PAY_FREQUENCY_2$ Here we see that NMOS primarily interacts with PAY_FREQUENCY_2$ All other interactions with NMOS have much smaller impacts Salford Systems © Copyright 2005-2012 31
32.
Example of an
important interaction Slope reverses due to interaction In this real world example the downward sloping portion of relationship was well known. But segment displaying upward sloping relationship was not known Note that the dominant pattern is downward sloping But a key segment defined by the 3rd variable is upward sloping Salford Systems © Copyright 2005-2012 32
33.
Real World Examples
Corporate default (Probability of Default or PD) Analysis of bank ratings Salford Systems © Copyright 2005-2012 33
34.
Corporate Default Scorecard
Mid-sized bank Around 200 defaults Around 200 indeterminate/slow Classic small sample problem Standard financial statements available, variety of ratios created All key variables had some fraction of missing data, from 2% to 6% in a set of 10 core predictors, and up to 35% missing in a set of 20 core predictors Single CART tree involves just 6 predictors yields cross-validated ROC= 0.6523 Salford Systems © Copyright 2005-2012 34
35.
TreeNet Analysis of
Corporate Default Data: Summary Display Summary reports progress of model as it evolves with an increasing number of trees. Markers indicate models optimizing entropy, misclassification rate, ROC Salford Systems © Copyright 2005-2012 35
36.
Corporate Default Model:
TreeNet 1789 Trees in model with best test sample area under ROC curve CXE Class Error ROC Area Lift -------------------------------------------------------------------- Optimal N Trees: 2000 1930 1789 1911 Optimal Criterion 0.534914 0.3485293 0.7164921 1.9231729 TreeNet Cross-Validated ROC=0.71649 far better than single tree Able to make use of more variables due to the many trees Single CART tree uses 6 predictors, TreeNet uses more than 20 Some of these variables were missing for many accounts Can extract graphical displays to describe model Salford Systems © Copyright 2005-2012 36
37.
Predictive Variable –
Financial Ratio 1 R is k o r P ro b o f D e fa u lt Ratio 1 S a le s / C u rre n t A s s e ts 0 .0 0 3 0 .0 0 2 0 .0 0 1 10 20 30 40 50 60 0 .0 0 0 R a tio V a lu e -0 .0 0 1 -0 .0 0 2 -0 .0 0 3 -0 .0 0 4 TreeNet can trace out impact of predictor on PD Salford Systems © Copyright 2005-2012 37
38.
Predictive Variable –
Financial Ratio 2 R is k o r P ro b o f D e fa u lt R e tu rn o n A s s e ts (R O A ) 0 .0 0 0 6 Ratio 2 O p e ra tin g P ro fit / T o ta l A s s e t 0 .0 0 0 4 0 .0 0 0 2 R a tio V a lu e -0 .9 -0 .8 -0 .7 -0 .6 -0 .5 -0 .4 -0 .3 -0 .2 -0 .1 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 -0 .0 0 0 2 -0 .0 0 0 4 Salford Systems © Copyright 2005-2012 38
39.
Additive vs Interaction
Model (2-node vs 6-node trees) 2 node trees model CXE Class Error ROC Area Lift -------------------------------------------------------------------- Optimal N Trees: 2000 1812 1978 1763 Optimal Criterion 0.54538 0.39331 0.70749 1.84881 6 node trees model Optimal N Trees: 2000 1930 1789 1911 Optimal Criterion 0.53491 0.34852 0.71649 1.92317 Two-node trees do not permit interactions of any kind as each term in the model involves a single predictor in isolation. Two-node tree models can be highly nonlinear but strictly additive Three- or more node trees allow progressively higher order interactions. Running model with different tree sizes allows simple discovery of the precise amount of interaction required for maximum performance models In this example no strong evidence for low-order interactions Salford Systems © Copyright 2005-2012 39
40.
Bank Ratings: Regression
Example Build a predictive model for average of major bank ratings Scaled average of S&P. Moody‟s, Fitch Ratings Challenges include Small data base (66 banks) Missing values prevalent (up to 35 missing in any predictor) Impossible to build linear regression model because of missings Expect relationships to be nonlinear 66 banks 25 potential predictors Study run in 2003! Salford Systems © Copyright 2005-2012 40
41.
Cross-Validated Performance: Predicting
Rating Score 1.4 CV Mean Absolute Error 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 1 60 119 178 237 296 355 414 473 532 591 650 709 768 827 886 945 Model Size The optimal model achieved around 860 trees with cross-validated mean absolute error 0.87, target variable ranges from 1 to 10 Salford Systems © Copyright 2005-2012 41
42.
Variable Importance Ranking
Ranks variables according to their contributions to the model Influenced by the number of times a variables is used and is strength as a splitter when it is used Salford Systems © Copyright 2005-2012 42
43.
Country Contribution to
Risk Score Distribution by Country US SE PT NL LU JP IT IE GR GB FR ES DK DE CH CA BE AT 0 2 4 6 8 Swiss banks (CH) tend be rated low risk and Greek and Portuguese, Italian, Irish and Japanese banks high risk Salford Systems © Copyright 2005-2012 43
44.
Bank Specialization
Distribution Specialised Governmental Credit Inst. Savings Bank Real Estate / Mortgage Bank Non-banking Credit Institution Investment Bank/Securities House Cooperative Bank Commercial Bank 0 10 20 30 40 50 Commercial banks and investment banks are rated higher risk Salford Systems © Copyright 2005-2012 44
45.
Scale: Total Deposits
Histogram 35 30 25 20 15 10 5 0 0 e+00 1 e+08 2 e+08 3 e+08 4 e+08 5 e+08 6 e+08 A step function in size of bank Salford Systems © Copyright 2005-2012 45
46.
ROAE
Histogram 40 30 20 10 0 -20 -10 0 10 20 30 40 Salford Systems © Copyright 2005-2012 46
47.
Cost to Income
Ratio: Impact on Risk Score Histogram 20 15 10 5 0 20 40 60 80 100 High cost to income increases risk score Salford Systems © Copyright 2005-2012 47
48.
Equity to Total
Assets Histogram 20 15 10 5 0 0 5 10 15 Greater risk forecasted when equity is a large share of total assets Salford Systems © Copyright 2005-2012 48
49.
References
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140. Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In L. Saitta, ed., Machine Learning: Proceedings of the Thirteenth National Conference, Morgan Kaufmann, pp. 148-156. Friedman, J.H. (1999). Stochastic gradient boosting. Stanford: Statistics Department, Stanford University. Friedman, J.H. (1999). Greedy function approximation: a gradient boosting machine. Stanford: Statistics Department, Stanford University. Hastie, T., Tibshirani, R., and Friedman, J.H (2009). The Elements of Statistical Learning. Springer. 2nd Edition Salford Systems © Copyright 2005-2012 49
Baixar agora