SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Cooking Predictions
A real case in the hotel sector
Andrés González
Big Data Prediction Manager
andresg@clevertask.com
Twitter: @data_lytics
CleverTask Solutions SL - Big Data Business Unit 3
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 4
Hotel Sector
• % room occupation.
• Cancellation risk.
• Income.
CleverTask Solutions SL - Big Data Business Unit 5
Business Need
Predict client’s
NATIONALITY
BEFORE
client
check-in
CleverTask Solutions SL - Big Data Business Unit 6
Staff Arrangement
Languages
CleverTask Solutions SL - Big Data Business Unit 7
Prepare Activities
CleverTask Solutions SL - Big Data Business Unit 8
Kitchen Arrangement
CleverTask Solutions SL - Big Data Business Unit 9
Customize Stay
CleverTask Solutions SL - Big Data Business Unit 10
… Details Make the
Difference
In short, because…
CleverTask Solutions SL - Big Data Business Unit 11
Machine Learning basics
CleverTask Solutions SL - Big Data Business Unit 12
Machine Learning basics
Can you find patterns in this data?
CleverTask Solutions SL - Big Data Business Unit
13
Machine Learning basics
Historical Data Training Prediction
New Data Re-Training
CleverTask Solutions SL - Big Data Business Unit 14
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit
Tasting the Dish
Cooking
Transforming
15
“Cooking” Predictions2
Go to the market to buy ingredients
Cleaning
CleverTask Solutions SL - Big Data Business Unit
Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
15
“Cooking” Predictions2
Gathering RAW data
Cleaning Data
CleverTask Solutions SL - Big Data Business Unit 16
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 17
Where does Data come from?
Own Website
Partners Websites
RAW Data
CleverTask Solutions SL - Big Data Business Unit 18
RAW Data
One year historical
reservation data
(.xlsx file)
Characteristics
•260.000 reservations
•80 fields
•57 categorical
•9 numeric
•10 date
•3 text
•1 incorrect field
•Size: 150 MB
CleverTask Solutions SL - Big Data Business Unit 19
RAW Data
CleverTask Solutions SL - Big Data Business Unit 20
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
21
The Process
New Fields
1 3 4
Transformation
and Feature
Engineering
“Clean” Data
Calculated Fields
2
Cleaning Model
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 22
Data Cleaning
CleverTask Solutions SL - Big Data Business Unit 23
Data Cleaning
Row Deletion
• Reservations without
check-in
• Cancelled reservations
• Rows with errors
Column Deletion
• IDs vs names
• Columns with little data
Other Actions
• Give dates a format
• Delete accents
• Transform .xlsx -> .csv
CleverTask Solutions SL - Big Data Business Unit 24
Clean Dataset
Clean
•150.000 reservations
•46 fields
•26 categorical
•9 numeric
•10 data
•1 text
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•57 categorical
•9 numeric
•10 data
•3 text
•1 incorrect field
•Size: 150 MB
CleverTask Solutions SL - Big Data Business Unit
“Dirty” RAW Data
Gathering Data
25
The Process
New Fields
1 3 4
Transformations
and Feature
Engineering
“Clean” Data
Calculated Fields
2
Cleaning Model
CleverTask Solutions SL - Big Data Business Unit 26
Transformations
Country Grouping
•A lot of countries to predict
(210)
•Some countries have very few
instances
•Grouping objective: mín. 1% of
total instances
• Does not affect business
objective
•Total number of groups: 20
New Fields
• RESERV_ANTICIPATION (calculated):
(reservation date - checkin date)
• COUNTRY_HOTEL (name of the
country)
• HOTEL_STARS (1-5)
CleverTask Solutions SL - Big Data Business Unit 27
Clean Dataset
Clean
•150.000 reservations
•46 fields
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•Size: 150 MB
Transformed
•150.000 registers
•49 fields
•Size: 80MB
CleverTask Solutions SL - Big Data Business Unit 28
What is Feature Engineering
Extract signal from noise
CleverTask Solutions SL - Big Data Business Unit 29
Feature Engineering
Techniques
• Detecta fields (features) that are predictorss
(signal) and bypass those that are not (noise)
• Dependand fields (pax, days, pax*days)
• Needless fields (reservation number)
• Fields with very little data
• Random fields (minute and second of reservation)
• Domain knowledge
• Experience
• Recursive cycle
CleverTask Solutions SL - Big Data Business Unit 30
Field
Selection
Algorithm
Adjustment
Prediction
Quality
Evaluation
Recursive Feature
Engineering
CleverTask Solutions SL - Big Data Business Unit 31
Clean Dataset
Clean
•150.000 reservations
•46 fields
•Size: 75MB
Dirty
•260.000 reservations
•80 fields
•Size: 150 MB
Transformed
•150.000 registers
•49 fields
•Size: 80MB
Final Dataset
•150.000 registers
•10 fields
•Size: 55MB
CleverTask Solutions SL - Big Data Business Unit 32
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 33
The Process
“Dirty” RAW Data
New Fields
1 3 4
Gathering Data
Transformation
and Feature
Engineering
“Clean” Data
Calculated
2
Cleaning Modeling
CleverTask Solutions SL - Big Data Business Unit 34
Modeling
Training
Learning
CleverTask Solutions SL - Big Data Business Unit 35
Modeling
CleverTask Solutions SL - Big Data Business Unit 37
Agenda
Business Need1
“Cooking” Predictions2
Gathering ingredients3
Cleaning and Transforming4
The recipe (the model)5
Tasting the dish6
CleverTask Solutions SL - Big Data Business Unit 38
Quality Evaluation
80%
20% Evaluation
Training
Test
Dataset
100%
Modelo
CleverTask Solutions SL - Big Data Business Unit 39
Quality Evaluation
Accuracy Confusion Matrix
CleverTask Solutions SL - Big Data Business Unit 40
Quality Evaluation
54% 75%
CleverTask Solutions SL - Big Data Business Unit 41
Quality Evaluation
Predicted vs Real Distribution
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%
Tasting the Dish
Cooking
Transforming
Go to the market to buy ingredients
Cleaning
CleverTask Solutions SL - Big Data Business Unit 42
Cooking Predictions
80%
20%
Evaluating Prediction Quality
Training the Model
Transforming and Feature Engineering
Gathering RAW data
Cleaning Data
CleverTask Solutions SL - Big Data Business Unit 43
Other Techniques
Ensembles Clusters
Weight Analysis Anomaly Detection
CleverTask Solutions SL - Big Data Business Unit 44
END
email: andresg@clevertask.com
Twitter: @data_lytics
www.clevertask.com

Mais conteúdo relacionado

Destaque

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Destaque (10)

Rebuilding Journalism: Winning the battle for attention
Rebuilding Journalism: Winning the battle for attentionRebuilding Journalism: Winning the battle for attention
Rebuilding Journalism: Winning the battle for attention
 
Tech Talk: Analytics at CA – What’s Cooking? Project Jarvis
Tech Talk: Analytics at CA – What’s Cooking? Project JarvisTech Talk: Analytics at CA – What’s Cooking? Project Jarvis
Tech Talk: Analytics at CA – What’s Cooking? Project Jarvis
 
Healthcare Transformation through IOT
Healthcare Transformation through IOTHealthcare Transformation through IOT
Healthcare Transformation through IOT
 
Cooking up the Semantic Web
Cooking up the Semantic WebCooking up the Semantic Web
Cooking up the Semantic Web
 
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Semelhante a L9. Real World Machine Learning - Cooking Predictions

Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
Society of Petroleum Engineers
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 

Semelhante a L9. Real World Machine Learning - Cooking Predictions (20)

[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
MLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven FactoryMLSEV. Use Case: The Data-Driven Factory
MLSEV. Use Case: The Data-Driven Factory
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
Digital Transformation Western Digital
Digital Transformation Western DigitalDigital Transformation Western Digital
Digital Transformation Western Digital
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
Supercharging Smart Meter BIG DATA Analytics with Microsoft Azure Cloud- SRP ...
 
Time Series Analytics for Big Fast Data
Time Series Analytics for Big Fast DataTime Series Analytics for Big Fast Data
Time Series Analytics for Big Fast Data
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Western Digital Digitalization Story
Western Digital Digitalization StoryWestern Digital Digitalization Story
Western Digital Digitalization Story
 
VSSML18. Improving Operations with Machine Learning
VSSML18. Improving Operations with Machine LearningVSSML18. Improving Operations with Machine Learning
VSSML18. Improving Operations with Machine Learning
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 

Mais de Machine Learning Valencia

Mais de Machine Learning Valencia (15)

From Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de MántarasFrom Turing To Humanoid Robots - Ramón López de Mántaras
From Turing To Humanoid Robots - Ramón López de Mántaras
 
Artificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom DietterichArtificial Intelligence Progress - Tom Dietterich
Artificial Intelligence Progress - Tom Dietterich
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIsL7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
L6. Unbalanced Datasets
L6. Unbalanced DatasetsL6. Unbalanced Datasets
L6. Unbalanced Datasets
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

L9. Real World Machine Learning - Cooking Predictions

  • 1. Cooking Predictions A real case in the hotel sector Andrés González Big Data Prediction Manager andresg@clevertask.com Twitter: @data_lytics
  • 2. CleverTask Solutions SL - Big Data Business Unit 3 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 3. CleverTask Solutions SL - Big Data Business Unit 4 Hotel Sector • % room occupation. • Cancellation risk. • Income.
  • 4. CleverTask Solutions SL - Big Data Business Unit 5 Business Need Predict client’s NATIONALITY BEFORE client check-in
  • 5. CleverTask Solutions SL - Big Data Business Unit 6 Staff Arrangement Languages
  • 6. CleverTask Solutions SL - Big Data Business Unit 7 Prepare Activities
  • 7. CleverTask Solutions SL - Big Data Business Unit 8 Kitchen Arrangement
  • 8. CleverTask Solutions SL - Big Data Business Unit 9 Customize Stay
  • 9. CleverTask Solutions SL - Big Data Business Unit 10 … Details Make the Difference In short, because…
  • 10. CleverTask Solutions SL - Big Data Business Unit 11 Machine Learning basics
  • 11. CleverTask Solutions SL - Big Data Business Unit 12 Machine Learning basics Can you find patterns in this data?
  • 12. CleverTask Solutions SL - Big Data Business Unit 13 Machine Learning basics Historical Data Training Prediction New Data Re-Training
  • 13. CleverTask Solutions SL - Big Data Business Unit 14 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 14. CleverTask Solutions SL - Big Data Business Unit Tasting the Dish Cooking Transforming 15 “Cooking” Predictions2 Go to the market to buy ingredients Cleaning
  • 15. CleverTask Solutions SL - Big Data Business Unit Evaluating Prediction Quality Training the Model Transforming and Feature Engineering 15 “Cooking” Predictions2 Gathering RAW data Cleaning Data
  • 16. CleverTask Solutions SL - Big Data Business Unit 16 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 17. CleverTask Solutions SL - Big Data Business Unit 17 Where does Data come from? Own Website Partners Websites RAW Data
  • 18. CleverTask Solutions SL - Big Data Business Unit 18 RAW Data One year historical reservation data (.xlsx file) Characteristics •260.000 reservations •80 fields •57 categorical •9 numeric •10 date •3 text •1 incorrect field •Size: 150 MB
  • 19. CleverTask Solutions SL - Big Data Business Unit 19 RAW Data
  • 20. CleverTask Solutions SL - Big Data Business Unit 20 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 21. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 21 The Process New Fields 1 3 4 Transformation and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  • 22. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 23. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 24. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 25. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 26. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 27. CleverTask Solutions SL - Big Data Business Unit 22 Data Cleaning
  • 28. CleverTask Solutions SL - Big Data Business Unit 23 Data Cleaning Row Deletion • Reservations without check-in • Cancelled reservations • Rows with errors Column Deletion • IDs vs names • Columns with little data Other Actions • Give dates a format • Delete accents • Transform .xlsx -> .csv
  • 29. CleverTask Solutions SL - Big Data Business Unit 24 Clean Dataset Clean •150.000 reservations •46 fields •26 categorical •9 numeric •10 data •1 text •Size: 75MB Dirty •260.000 reservations •80 fields •57 categorical •9 numeric •10 data •3 text •1 incorrect field •Size: 150 MB
  • 30. CleverTask Solutions SL - Big Data Business Unit “Dirty” RAW Data Gathering Data 25 The Process New Fields 1 3 4 Transformations and Feature Engineering “Clean” Data Calculated Fields 2 Cleaning Model
  • 31. CleverTask Solutions SL - Big Data Business Unit 26 Transformations Country Grouping •A lot of countries to predict (210) •Some countries have very few instances •Grouping objective: mín. 1% of total instances • Does not affect business objective •Total number of groups: 20 New Fields • RESERV_ANTICIPATION (calculated): (reservation date - checkin date) • COUNTRY_HOTEL (name of the country) • HOTEL_STARS (1-5)
  • 32. CleverTask Solutions SL - Big Data Business Unit 27 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB
  • 33. CleverTask Solutions SL - Big Data Business Unit 28 What is Feature Engineering Extract signal from noise
  • 34. CleverTask Solutions SL - Big Data Business Unit 29 Feature Engineering Techniques • Detecta fields (features) that are predictorss (signal) and bypass those that are not (noise) • Dependand fields (pax, days, pax*days) • Needless fields (reservation number) • Fields with very little data • Random fields (minute and second of reservation) • Domain knowledge • Experience • Recursive cycle
  • 35. CleverTask Solutions SL - Big Data Business Unit 30 Field Selection Algorithm Adjustment Prediction Quality Evaluation Recursive Feature Engineering
  • 36. CleverTask Solutions SL - Big Data Business Unit 31 Clean Dataset Clean •150.000 reservations •46 fields •Size: 75MB Dirty •260.000 reservations •80 fields •Size: 150 MB Transformed •150.000 registers •49 fields •Size: 80MB Final Dataset •150.000 registers •10 fields •Size: 55MB
  • 37. CleverTask Solutions SL - Big Data Business Unit 32 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 38. CleverTask Solutions SL - Big Data Business Unit 33 The Process “Dirty” RAW Data New Fields 1 3 4 Gathering Data Transformation and Feature Engineering “Clean” Data Calculated 2 Cleaning Modeling
  • 39. CleverTask Solutions SL - Big Data Business Unit 34 Modeling Training Learning
  • 40. CleverTask Solutions SL - Big Data Business Unit 35 Modeling
  • 41. CleverTask Solutions SL - Big Data Business Unit 37 Agenda Business Need1 “Cooking” Predictions2 Gathering ingredients3 Cleaning and Transforming4 The recipe (the model)5 Tasting the dish6
  • 42. CleverTask Solutions SL - Big Data Business Unit 38 Quality Evaluation 80% 20% Evaluation Training Test Dataset 100% Modelo
  • 43. CleverTask Solutions SL - Big Data Business Unit 39 Quality Evaluation Accuracy Confusion Matrix
  • 44. CleverTask Solutions SL - Big Data Business Unit 40 Quality Evaluation 54% 75%
  • 45. CleverTask Solutions SL - Big Data Business Unit 41 Quality Evaluation Predicted vs Real Distribution
  • 46. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Tasting the Dish Cooking Transforming Go to the market to buy ingredients Cleaning
  • 47. CleverTask Solutions SL - Big Data Business Unit 42 Cooking Predictions 80% 20% Evaluating Prediction Quality Training the Model Transforming and Feature Engineering Gathering RAW data Cleaning Data
  • 48. CleverTask Solutions SL - Big Data Business Unit 43 Other Techniques Ensembles Clusters Weight Analysis Anomaly Detection
  • 49. CleverTask Solutions SL - Big Data Business Unit 44 END email: andresg@clevertask.com Twitter: @data_lytics www.clevertask.com