SlideShare uma empresa Scribd logo
1 de 27
Predictive Analytics
Peter Bruce
THE INSTITUTE FOR STATISTICS EDUCATION
at Statistics.com
peter.bruce@statistics.com
About Statistics.com
THE INSTITUTE FOR STATISTICS EDUCATION
• 100+ courses, introductory and advanced
• Traditional statistics, data mining, machine
learning, text mining, clinical
trials, optimization, use of R
• All online
• Typically 4 weeks, scheduled dates
• Don’t need to be online particular times/days
• Private discussion forum with instructors - noted
authors & experts
A man walks into a Target® store…
Predictive Analytics
• In marketing, used for model driven targeted
sales efforts
• Also… will loan default, what diagnosis (given
symptoms), is tax return fraudulent, …
Market Research
• Traditionally surveys, analysis, information
gathering, strategy
• Moving online increases the amount of
data, speeds its flow, and makes it more
accessible
Washington Post (web)
• 35 different reports tracking traffic daily
• Midday report “are we on track for visitors?”
• # visitors from key domains - .gov, .mil, .senate
or .house
Daily Mail (UK web)
• A traditional ingredient is stories about
animals – tracked on web
• “The animals that do best are
monkeys, dogs, and cats, in that order…”
Martin Clark (editor)
Back to Target
Predictive Analytics
• Goes beyond the obvious, capturing
complexity
• Implemented for real-time behavior and
decisions
Pregnant?
• Obvious retail clues – maternity clothes, baby
food, baby clothes, crib …
• These may be too late
• Earlier clues not so obvious –
lotions, supplements, and, esp., combinations
and changes in purchase patterns
• Data mining algorithms can capture these less
obvious, more complex signals
Training the Model
• Bridal registry
• Women of similar demographic not on bridal
registry
• Together, the training set
– Known outcome
– Purchase data over time
Hypothetical Data
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ?
1011 1 1 1 1 0 1 0
1012 1 0 1 0 1 0 1
1013 1 1 0 1 1 0 1
1014 0 1 1 0 1 1 0
1015 1 1 0 1 1 0 0
1016 0 0 1 0 1 0 1
Classification Algorithms
• K-nearest neighbors (involves 3 notions)
– Distance measure
– Centroid
– Majority vote or average
K-NN
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90
Registry
?
1 1 1 1 1 0 1 0
2 1 1 1 1 0 1 0
3 1 1 0 1 0 1 0
4 0 1 1 0 1 1 0
5 1 1 0 1 1 0 1
6 0 0 1 0 1 0 1
NEW 1 0 1 1 0 1 ?
Classification Algorithms, cont.
• Logistic Regression
• CART
• Discriminant Analysis
• Neural Network
• Naïve Bayes
The Overfit Problem
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Complex function - overfit
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Therefore: Validate the Model
• Partition the original data
– Training
– Validation
• Fit the model to the training data
• Assess performance using the validation data
Performance Metrics
• Continuous
– RMSE
• Categorical (often binary)
– % accurate (confusion matrix)
– Lift
Confusion Matrix and Cutoff Control
Training Data scoring - Summary Report
Cut off Prob.Val. for Success (Updatable) 0.5
Classification Confusion Matrix
Predicted Class
Actual Class 1 0
1 43 8
0 6 247
Lift
• In classifying “pregnant” vs. “not-pregnant”
classifying everyone as “not-pregnant” has
very high overall accuracy
• Need metric that reflects greater importance
of the “pregnant” category, which is rare
• Lift is the model’s improvement over average
random selection
Decile Lift Chart
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Decilemean/Globalmean
Deciles
Decile-wise lift chart (validation dataset)
Validate the Model
• Compare one model to another
• Avoid overfit
• Solution: apply model to hold-out sample
– Assess performance of different models
– Fine tune parameters of individual models
Partitioning
• Randomly split the initial data into 2 or 3
groups
– Training
– Validation
– Test
• Repeated use of validation data to compare
and fine tune models -> overfit to
validation, in addition to training
– “Test” partition used only once, at the end
Software
• SAS Enterprise Miner $$$$
• IBM SPSS Modeler (Clementine) $$$$
• XLMiner (Excel add-in) $
• Statistica Data Miner $$
• Salford Systems $$
• Rapid Miner $$ (open source free version)
• R open source free
Data Mining - More
• Clustering (segmentation)
• Profiling (explanatory models)
• Time series
• Affinity (recommender systems)
• Text analytics (NLP, sentiment analysis)
Skill Shortage
• McKinsey “Big Data” report
– Supply gap of 140,000-190,000 “deep analytical
talent”
• Emergence of “Analytics” masters programs
(Northwestern, NC State, …)

Mais conteúdo relacionado

Destaque

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictif
agileDSS
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
Daniel Westzaan
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
eoda GmbH
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
James Shearer
 

Destaque (20)

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictif
 
Prospect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshProspect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladesh
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
Le price training presentation
Le price training presentationLe price training presentation
Le price training presentation
 
Predictive analysis
Predictive analysisPredictive analysis
Predictive analysis
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Cwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenance
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environment
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
 
Gpao 4 Juste à temps Kanban
Gpao 4 Juste à temps KanbanGpao 4 Juste à temps Kanban
Gpao 4 Juste à temps Kanban
 
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
 
Machinery Oil Analysis
Machinery Oil AnalysisMachinery Oil Analysis
Machinery Oil Analysis
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
DeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelDeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnel
 
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
 

Semelhante a Predictive Analysis

Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
ImXaib
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
RichardGroom
 

Semelhante a Predictive Analysis (20)

Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logistics
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Basic Overview of Data Mining
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data Mining
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache Spark
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
 

Mais de Michael Bystry

Mais de Michael Bystry (11)

Peril and Promise of Social Media
Peril and Promise of Social MediaPeril and Promise of Social Media
Peril and Promise of Social Media
 
Creating Marketing Personas
Creating Marketing PersonasCreating Marketing Personas
Creating Marketing Personas
 
Why Become PRC Certified
Why Become PRC CertifiedWhy Become PRC Certified
Why Become PRC Certified
 
Learning About America from the 2010 Census
Learning About America from the 2010 CensusLearning About America from the 2010 Census
Learning About America from the 2010 Census
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey Research
 
Exploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipExploring Evoving Trends in Viewship
Exploring Evoving Trends in Viewship
 
Online Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelOnline Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic Channel
 
Broadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsBroadcast Television: Trends and Implications
Broadcast Television: Trends and Implications
 
Predicting College Tuition
Predicting College TuitionPredicting College Tuition
Predicting College Tuition
 
On Campus DvD kiosks
On Campus DvD kiosksOn Campus DvD kiosks
On Campus DvD kiosks
 
Conjoint class project
Conjoint class projectConjoint class project
Conjoint class project
 

Último

Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Anamikakaur10
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
dlhescort
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
Matteo Carbone
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 

Último (20)

Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 

Predictive Analysis

  • 1. Predictive Analytics Peter Bruce THE INSTITUTE FOR STATISTICS EDUCATION at Statistics.com peter.bruce@statistics.com
  • 2. About Statistics.com THE INSTITUTE FOR STATISTICS EDUCATION • 100+ courses, introductory and advanced • Traditional statistics, data mining, machine learning, text mining, clinical trials, optimization, use of R • All online • Typically 4 weeks, scheduled dates • Don’t need to be online particular times/days • Private discussion forum with instructors - noted authors & experts
  • 3. A man walks into a Target® store…
  • 4. Predictive Analytics • In marketing, used for model driven targeted sales efforts • Also… will loan default, what diagnosis (given symptoms), is tax return fraudulent, …
  • 5. Market Research • Traditionally surveys, analysis, information gathering, strategy • Moving online increases the amount of data, speeds its flow, and makes it more accessible
  • 6. Washington Post (web) • 35 different reports tracking traffic daily • Midday report “are we on track for visitors?” • # visitors from key domains - .gov, .mil, .senate or .house
  • 7. Daily Mail (UK web) • A traditional ingredient is stories about animals – tracked on web • “The animals that do best are monkeys, dogs, and cats, in that order…” Martin Clark (editor)
  • 9. Predictive Analytics • Goes beyond the obvious, capturing complexity • Implemented for real-time behavior and decisions
  • 10. Pregnant? • Obvious retail clues – maternity clothes, baby food, baby clothes, crib … • These may be too late • Earlier clues not so obvious – lotions, supplements, and, esp., combinations and changes in purchase patterns • Data mining algorithms can capture these less obvious, more complex signals
  • 11. Training the Model • Bridal registry • Women of similar demographic not on bridal registry • Together, the training set – Known outcome – Purchase data over time
  • 12. Hypothetical Data Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1011 1 1 1 1 0 1 0 1012 1 0 1 0 1 0 1 1013 1 1 0 1 1 0 1 1014 0 1 1 0 1 1 0 1015 1 1 0 1 1 0 0 1016 0 0 1 0 1 0 1
  • 13. Classification Algorithms • K-nearest neighbors (involves 3 notions) – Distance measure – Centroid – Majority vote or average
  • 14. K-NN Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1 1 1 1 1 0 1 0 2 1 1 1 1 0 1 0 3 1 1 0 1 0 1 0 4 0 1 1 0 1 1 0 5 1 1 0 1 1 0 1 6 0 0 1 0 1 0 1 NEW 1 0 1 1 0 1 ?
  • 15. Classification Algorithms, cont. • Logistic Regression • CART • Discriminant Analysis • Neural Network • Naïve Bayes
  • 16. The Overfit Problem 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 17. Complex function - overfit 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 18. Therefore: Validate the Model • Partition the original data – Training – Validation • Fit the model to the training data • Assess performance using the validation data
  • 19. Performance Metrics • Continuous – RMSE • Categorical (often binary) – % accurate (confusion matrix) – Lift
  • 20. Confusion Matrix and Cutoff Control Training Data scoring - Summary Report Cut off Prob.Val. for Success (Updatable) 0.5 Classification Confusion Matrix Predicted Class Actual Class 1 0 1 43 8 0 6 247
  • 21. Lift • In classifying “pregnant” vs. “not-pregnant” classifying everyone as “not-pregnant” has very high overall accuracy • Need metric that reflects greater importance of the “pregnant” category, which is rare • Lift is the model’s improvement over average random selection
  • 22. Decile Lift Chart 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 Decilemean/Globalmean Deciles Decile-wise lift chart (validation dataset)
  • 23. Validate the Model • Compare one model to another • Avoid overfit • Solution: apply model to hold-out sample – Assess performance of different models – Fine tune parameters of individual models
  • 24. Partitioning • Randomly split the initial data into 2 or 3 groups – Training – Validation – Test • Repeated use of validation data to compare and fine tune models -> overfit to validation, in addition to training – “Test” partition used only once, at the end
  • 25. Software • SAS Enterprise Miner $$$$ • IBM SPSS Modeler (Clementine) $$$$ • XLMiner (Excel add-in) $ • Statistica Data Miner $$ • Salford Systems $$ • Rapid Miner $$ (open source free version) • R open source free
  • 26. Data Mining - More • Clustering (segmentation) • Profiling (explanatory models) • Time series • Affinity (recommender systems) • Text analytics (NLP, sentiment analysis)
  • 27. Skill Shortage • McKinsey “Big Data” report – Supply gap of 140,000-190,000 “deep analytical talent” • Emergence of “Analytics” masters programs (Northwestern, NC State, …)