SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Measure Camp | London, 10th September 2016
Find signal in noise.
6 steps to find value from messy data.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Michal Brys
Data Scientist @ Allegro
Specialized also in:
+ Google Analytics
+ Google Tag Manager
michalbrys.com
about.me/michal.brys
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Framework for data analysis
CRISP-DM
- Cross Industry Standard Process for Data Mining
- Set up in 1996 (SPSS, Teradata, Daimler AG, NCR ,OHRA)
- Still works!
Read more: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
- Define analysis goal
- What you want to achieve by analysis?
- Check business context
- Don’t be afraid to ask questions
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
1: Business Understanding
I want to select customers group with the
highest probability of response (...)
to target marketing campaign for this group.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
- Collect data
Check:
- What all variables in dataset means
- How about missing values?
- Exploratory data analysis (EDA)
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
Google Analytics with client id as custom dimension
- Source: Cookies + JavaScript tracker
- Processed by Google Analytics
- No access to raw data
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
2: Data Understanding
10 000 records with 11 variables
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
- Data cleaning
- Prepare new variables, transform data
- Remove missing and outstanding values
- Check distributions
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
3: Data Preparation
Example: Fix variables type.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
4: Modeling
- Classification problem
- Prepare models by different methods
- Training and test subset
- CART
C5.0
Logit Regression
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
5: Evaluation
Model
True
Negative
True
Positive
False
Negative
False
Positive
Total Error
Rate
CART 5081 3150 1080 689 17.69%
C5.0 4089 2701 1606 1604 32.10%
Logit Regression 5871 2107 1307 715 20.22%
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
6: Deployment
- Prepare report
- Implement in system
- Bulid product
- ...
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Summary
CRISP-DM
+ Keeps business goal in mind
+ Result will answer for initial question
+ Reproducible and documented process
Image: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.png
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
More inspiration
“Data Mining Methods and Models”
Daniel T. Larose
“The Signal and the Noise”
Nate Silver
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
One more thing...
michalbrys.gitbooks.io/r-google-analytics/
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016
Q&A
Michal Brys
about.me/michal.brys
github.com/michalbrys

Mais conteúdo relacionado

Mais procurados

Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
Chris Bizer
 

Mais procurados (19)

Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Evolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and Applications
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.
 
Graph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersGraph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise Papers
 
Fast Data processing with RFX
Fast Data processing with RFXFast Data processing with RFX
Fast Data processing with RFX
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
Using graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsUsing graph technology for multi-INT investigations
Using graph technology for multi-INT investigations
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projects
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Searching Linked Data with Spinque
Searching Linked Data with SpinqueSearching Linked Data with Spinque
Searching Linked Data with Spinque
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data science
 
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 

Destaque (7)

Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.
 
Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud Platform
 
Google Analytics + R
Google Analytics + RGoogle Analytics + R
Google Analytics + R
 
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleBigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
 
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsPoznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
 
Web users tracking behind the scenes
Web users tracking behind the scenesWeb users tracking behind the scenes
Web users tracking behind the scenes
 

Semelhante a Find signal in noise.

Semelhante a Find signal in noise. (20)

Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faro
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
 
SC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectSC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope project
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - Martin
 
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
 
How to become the best datascientist in Europe
How to become the best datascientist in EuropeHow to become the best datascientist in Europe
How to become the best datascientist in Europe
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWC
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
From an idea to an apache tlp
From an idea to an apache tlpFrom an idea to an apache tlp
From an idea to an apache tlp
 
Elasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized MetricsElasticsearch – Introducing New Containerized Metrics
Elasticsearch – Introducing New Containerized Metrics
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
 
Odoo code search
Odoo code searchOdoo code search
Odoo code search
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
GAUC 2017 Workshop Datastudio: Martin Frotzler (e-dialog) & Lili Pajer (Google)
 
Sopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital ArchitectureSopra Steria / Interconnect2017 Digital Architecture
Sopra Steria / Interconnect2017 Digital Architecture
 
MVA Transcript
MVA TranscriptMVA Transcript
MVA Transcript
 

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Find signal in noise.

  • 1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Michal Brys Data Scientist @ Allegro Measure Camp | London, 10th September 2016 Find signal in noise. 6 steps to find value from messy data.
  • 2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Michal Brys Data Scientist @ Allegro Specialized also in: + Google Analytics + Google Tag Manager michalbrys.com about.me/michal.brys
  • 3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Framework for data analysis CRISP-DM - Cross Industry Standard Process for Data Mining - Set up in 1996 (SPSS, Teradata, Daimler AG, NCR ,OHRA) - Still works! Read more: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 1: Business Understanding - Define analysis goal - What you want to achieve by analysis? - Check business context - Don’t be afraid to ask questions
  • 5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 1: Business Understanding I want to select customers group with the highest probability of response (...) to target marketing campaign for this group.
  • 6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding - Collect data Check: - What all variables in dataset means - How about missing values? - Exploratory data analysis (EDA)
  • 7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding Google Analytics with client id as custom dimension - Source: Cookies + JavaScript tracker - Processed by Google Analytics - No access to raw data
  • 8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 2: Data Understanding 10 000 records with 11 variables
  • 9. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 3: Data Preparation - Data cleaning - Prepare new variables, transform data - Remove missing and outstanding values - Check distributions
  • 10. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 3: Data Preparation Example: Fix variables type.
  • 11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 4: Modeling - Classification problem - Prepare models by different methods - Training and test subset - CART C5.0 Logit Regression
  • 12. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 5: Evaluation Model True Negative True Positive False Negative False Positive Total Error Rate CART 5081 3150 1080 689 17.69% C5.0 4089 2701 1606 1604 32.10% Logit Regression 5871 2107 1307 715 20.22%
  • 13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 6: Deployment - Prepare report - Implement in system - Bulid product - ...
  • 14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Summary CRISP-DM + Keeps business goal in mind + Result will answer for initial question + Reproducible and documented process Image: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.png
  • 15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 More inspiration “Data Mining Methods and Models” Daniel T. Larose “The Signal and the Noise” Nate Silver
  • 16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 One more thing... michalbrys.gitbooks.io/r-google-analytics/
  • 17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 10.09.2016 Q&A Michal Brys about.me/michal.brys github.com/michalbrys