This is designed for the beginner or someone new to Salford Systems’ data mining products. These videos skip over the theoretical background of each data mining product and show you how to:
• Import a dataset and select a data mining engine for model building
• Configure the basic model setup parameters
• Build a model with each core data mining product in the SPM software suite, and
• Interpret the output and model performance
2. INTRO
Building your first model with a new data mining tool can be intimidating.
Though some of us may have some intuition for model building, it’s pretty daunting to
look at the default settings, knowing you have a ways to go before you have an
accurate, explainable predictive model to hand over to your boss.
To make sure you’re set up for data mining success, follow these simple steps to build
your first models in the SPM software suite.
3. Want to skip ahead? Here’s what we’re going to cover.
IMPORT DATA
5 … Prepare
6 … Stay Organized
PERFORMANCE
Model Setup
8 … Select and Engine 15 … What To Look For
17 … What’s Next
9 … Analysis Type
10 … Variables
11 … Testing
12 … Control Parameters
4. IMPORT
DATA
We’re going to walk you through best practices
for preparing and uploading your data into the
SPM software.
5. PREPARE
1
Make sure your data is in a ‘flat’ file (i.e. rows x columns)
2
Make sure you understand your variable labels! If you don’t
understand what your variables represent, you’re going to have a heck
of a time understanding your results.
Want to read the nitty gritty?
Want to read the nitty gritty?
Check out the complete SPM User Guide.
Check out the complete SPM User Guide.
6. STAY ORGANIZED
Save your data set, or sets, in one,
easy-to-find folder. If you’re pulling in
data from all over creation, you’re
just making the process longer and
more difficult to comprehend. Do
yourself a favor and dedicate a
directory to each data mining project
you’re working on.
7. Model Setup
10 parameters to pay attention to
when building a model
Once you have imported your data, you need to set
a few parameters (leaving most of them in default
settings) before you click ‘start.’
10. SELECT A TARGET VARIABLE
AND PREDICTORS
1
You must have a target variable.
2
You should have multiple predictors.
3
You don’t need to use all of your predictors.
4
Take note of categorical vs. continuous variables.
11. SELECT A TESTING METHOD
No independent testing – exploratory tree
Fraction of cases selected at random for testing (%)
Test sample contained in a separate file
V-fold cross-validation (i.e 10)
12. Salford Systems Recommends
That You Manually Set Your:
•
Learn rate
•
Number of trees built
•
Number of nodes in a tree
•
Loss criterion
*These will vary depending on the modeling engine being used to build a model.
14. EVALUATING
YOUR
PERFORMANCE
Don’t get overwhelmed by all of the fancy reporting features available in the SPM
software suite. Start slow. We will show you where to begin if you are new to using
SPM and just want to understand what your model means.
15. What To Look For
•
•
•
•
•
Mean Squared Error (MSE)
R-Squared
Test vs. Learn Performance
Variable Performance
Variable Dependence Plots
(TreeNet)
16. … AND
YOU’RE
DONE!
If you have already
downloaded the SPM
software, build a model!
Once you’ve built your first
model, start tweaking some of
the control parameters we
discussed.
What is your best model
performance so far?
17. WHAT’S NEXT?
• Watch our video series on
how to build your first
model.
Watch our video series on how to build your first model.