SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
WTF is Data Science?
Dylan Gregersen
OpenWest 2018
My name is Dylan Gregersen
I like these things... You can find me at…
dylangregersen
I am the lead data
scientist at...
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Data Science is the process of
collecting, cleaning, analyzing,
visualizing, and communicating
data in order to solve problems
in the real world.
Data science is...
What people think data science is...
People often think data science
is all about mathematics,
algorithms, and something call
“machine learning”
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
What most data science is...
Data science actually consists
mostly of data collection,
cleaning, and organization
(often 80% of the work)
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
What people forget that data science is
People tend to forget the skills
needed in data science to
communicate results so someone
can take an action in the real
worldRachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Data science is a process
When doing data science we...
1. Conceptual Data Model: Collect
data and create a conceptual data
model of real world phenomena
2. Understand the data: We use that
data model to understand something
about the phenomena
3. Solve a Problem: We apply that
understanding to solve a problem
4. Take action: Ultimately, we succeed
when our solution leads to actions
Data science is successful when you learn
something about the real world which
helps you solve a problem by taking an
action.
Data science is successful when you learn
something about the real world which
helps you solve a problem by taking an
action.
Example: What is my conference room utilization?
Identifying the problem
U: What is my conference room utilization?
Identifying the problem
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which rooms are underutilized
Me: Why do you want to know?
U: To improve the efficiency of conference rooms use
Me: What are you going to do with that information?
A: Repurpose rooms who’s meeting usage is less than 50%
Problem: Conference rooms should be used efficiently
Action: repurpose rooms with usage less than 50%, also heavily used areas
Metric: room utilization = hours in use / available hours per day
Identifying the problem
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which rooms are underutilized
Me: Why do you want to know?
U: To improve the efficiency of conference rooms use
Me: What are you going to do with that information?
A: Repurpose rooms who’s meeting usage is less than 50%
What problem are you
trying to solve?
What action will you take
with this number?
Problem: Change meeting rooms to fit the needs of department
Action: make purchasing decisions about technology or furniture
Metrics: room utilization, organizer’s department, occupancy size,
technology or furniture used
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which departments are using the rooms the most.
Me: Why do you want to know?
U: To adjust the rooms to meet their needs
Me: What are you going to do with that information?
A: Buy new technology or furniture to better meet those needs
Identifying the problem
Solving the Problem
Start by figuring out a plan.
1. Conceptual Data Model: Collect
data and create a conceptual data
model of real world phenomena
2. Understand the data: We use that
data model to understand something
about the phenomena
3. Solve a Problem: We apply that
understanding to solve a problem
4. Take action: Ultimately, we succeed
when our solution leads to actions
Solving the Problem
Start by figuring out a plan.
Document requirements and
get feedback from your end
user
Problem: Conference rooms should be
used efficiently
Action: repurpose rooms with usage
less than 50%, also heavily used areas
Metric: room utilization = hours in use
/ available hours per day
Solving the Problem
Having a plan...
● Helps you stay focused
● Helps you communicate with your
end users
● Build in things you’ll need in
production: data quality, alerts,
testing, security, code reviews
Solving the Problem
Now with a plan
1. Conceptual Data Model: Collect
data and create a conceptual data
model of real world phenomena
2. Understand the data: We use that
data model to understand something
about the phenomena
3. Solve a Problem: We apply that
understanding to solve a problem
4. Take action: Ultimately, we succeed
when our solution leads to actions
Collect data and create a
conceptual data model of real
world phenomena
Small project you might use python and
store in a folder called “raw_data”
Large project you might use python+kafka
and store in AWS S3
{
….
"id": "6iunsmr8qv1k1c5avlek045oup",
"iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com",
"summary": "OpenWest: WTF is data science?",
"status": "confirmed",
"start": {
"dateTime": "2018-06-08T11:30:00-06:00"
},
"end": {
"dateTime": "2018-06-08T12:30:00-06:00"
},
….
}
Metadata: room_id, customer_id, time_range
Google Event File
1. Conceptual Data Model
80% of data science work is
cleaning and structuring the
data.
Small project you might use python to
process “raw_data” into “processed_data”
Large project you might use AWS Glue to
process AWS S3 data and store in AWS
Redshift
{
….
"id": "6iunsmr8qv1k1c5avlek045oup",
"iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com",
"summary": "OpenWest: WTF is data science?",
"status": "confirmed",
"start": {
"dateTime": "2018-06-08T11:30:00-06:00"
},
"end": {
"dateTime": "2018-06-08T12:30:00-06:00"
},
….
}
Metadata: room_id, customer_id, time_range
Google Event File
1. Conceptual Data Model
80% of data science work is
cleaning and structuring the
data.
1. Conceptual Data Model
{
….
"id": "6iunsmr8qv1k1c5avlek045oup",
"iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com",
"summary": "OpenWest: WTF is data science?",
"status": "confirmed",
"start": {
"dateTime": "2018-06-08T11:30:00-06:00"
},
"end": {
"dateTime": "2018-06-08T12:30:00-06:00"
},
….
}
Metadata: room_id, customer_id, time_range
Google Event File
INSERT INTO customer
1212 AS customer_id
INSERT INTO room
42 AS room_id
1212 AS customer_id
INSERT INTO event
"6iunsmr8qv1k1c5avlek045oup" AS event_id
“2018-06-08T17:30:00Z” AS event_start_utc
3600.0 AS event_duration
“confirmed” AS event_status
INSERT INTO fact_room_event
room_id
event_id
Structured Data - Star Schema
We use that data model to
understand something about
the phenomena
2. Understand the Data
Explore, manipulate the data.
Question the data quality and
return to cleaning if necessary.
Small project you might use python to load
“processed_data” and make plots
Large project you might use SQL to query
AWS Redshift and use python to visualize
2. Understand the Data
INSERT INTO customer
1212 AS customer_id
INSERT INTO room
42 AS room_id
1212 AS customer_id
INSERT INTO event
"6iunsmr8qv1k1c5avlek045oup" AS event_id
“2018-06-08T17:30:00Z” AS event_start_utc
3600.0 AS event_duration
“confirmed” AS event_status
INSERT INTO fact_room_event
room_id
event_id
Structured Data - Star Schema
Explore, manipulate the data.
Question the data quality and
return to cleaning if necessary.
2. Understand the Data
INSERT INTO customer
1212 AS customer_id
INSERT INTO room
42 AS room_id
1212 AS customer_id
INSERT INTO event
"6iunsmr8qv1k1c5avlek045oup" AS event_id
“2018-06-08T17:30:00Z” AS event_start_utc
3600.0 AS event_duration
“confirmed” AS event_status
INSERT INTO fact_room_event
room_id
event_id
Structured Data - Star Schema
3. Solve a Problem
We apply that understanding to
solve a problem
3. Solve a Problem
We apply that understanding to
solve a problem
Problem: Conference rooms should be
used efficiently
Action: repurpose rooms with usage
less than 50%, also heavily used areas
Metric: room utilization = hours in use
/ available hours per day
3. Solve a Problem
Did we solve the problem?
What action are you going to
take?
Problem: Conference rooms should be
used efficiently
Action: repurpose rooms with usage
less than 50%, also heavily used areas
Metric: room utilization = hours in use
/ available hours per day
3. Take Action
Ultimately, we succeed when
our solution leads to actions
3. Take Action
Ultimately, we succeed when
our solution leads to actions
Small project might periodically recreate to
allow user to take new actions.
Large project you might provide a tool for
the user to recreate on their own.
3. Take Action
Ultimately, we succeed when
our solution leads to actions
In our example, our Facilities Gal goes and
looks at the bottom three rooms. Decides
that Camp Ivanhoe really isn’t needed.
She also checks Fire Swamp and asks
some people why it is used so much.
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Data science is a process
1. Collect data and create a conceptual
data model of real world phenomena
2. We use that data model to
understand something about the
phenomena
3. We apply that understanding to
solve a problem
4. Ultimately, we succeed when our
solution leads to actions
? ?
?
Cool! So what about
machine learning and
predictive modeling?
The data science process
has a hierarchy of needs
Data Basics
The data science hierarchy of needs
describes the stages of data
complexity and insights
The Data Science Process
The Data Science Process
The Data Science Process
The Data Science Process
First point of value
Descriptive Analytics are your first
stage where you can actually answer
questions.
Especially important for business end
users who want the results of your
data.
First point of value
Businesses spend 1-3
months to get this into
production the first time
They spend 1-3 years to
really get this right
Descriptive Analytics are
your first stage where you
can actually answer
questions.
Businesses spend 1-3
months to get this into
production the first time
They spend 1-3 years to
really get this right
1-2 years to do this well
1-2 years integrate these
1+ years grow modeling to
optimizations
In the real world,
data science is a team
activity
Data-Driven Companies Build Data Science Teams
Data Engineer
Data Architect
Data Analyst
Developer
Product
Manager
QA
Statistician
Chief Data Officer
Senior Data
Analyst
Data Steward
Data Engineer
Business
Analyst
Myth of the data scientist
Data science requires many different
jobs and skills.
Being a “data scientist” is very much
like being a “full stack developer”.
The most data-driven companies are
creating data specific jobs: data
engineers, data architects, data
analysts, data researchers.
How do you get started?
Start with descriptive analytics
Best way to build your intuition about the data
science process works. Become good at
identifying the root question, problem to solve,
and the possible actions to be taken.
Start with descriptive analytics
Best way to build your intuition about the data
science process works. Become good at
identifying the root question, problem to solve,
and the possible actions to be taken.
Open Data Sets:
www.kaggle.com/datasets
www.data.gov
www.github.com/awesomedata/awesome-public-datasets
www.google.com/search?q=open+data+sets
Start with descriptive analytics
Best way to build your intuition about the data
science process works. Become good at
identifying the root question, problem to solve,
and the possible actions to be taken.
● The best tools are powerful.
● The best tools are easy to use and learn.
● The best tools support teamwork.
● The best tools are beloved by the community.
Excel is still a standard across the data world and is a
perfectly fine way to get started.
Data science is successful when
you learn something about the real
world which helps you solve a
problem by taking an action.
You set yourself for success if you...
● Foster a determination to discover the
underlying problems to solve
● Learn to work with data
What is data science?
References and Resources
● Rachel Schutt & Cathy O’Neil (2013) Doing Data Science: Straight Talk From the
Frontline, Sebastopol, CA: O’Reilly
● DJ Patil & Hilary Mason (2015) Data Driven. Sebastopol, CA: O’Reilly
● DJ Patil (2011) Building Data Science Teams. Sebastopol, CA: O’Reilly
● Monica Rogati (2017) The AI Hierarchy of Needs
● Nick Crocker (2014) Thirty Things I’ve Learned
● Tavish Srivastava (2015) 13 Tips to make you awesome in Data Science / Analytics Jobs
● Daniel Tunkelang (2017) 10 Things Everyone Should Know About Machine Learning
● DJ Patil - Everything We Wish We'd Known About Building Data Products
Data science is successful when
you learn something about the real
world which helps you solve a
problem by taking an action.
You set yourself for success if you...
● Foster a determination to discover the
underlying problems to solve
● Learn to work with data
Thank You!

Mais conteúdo relacionado

Mais procurados

The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine Learning
Waziri Shebogholo
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 

Mais procurados (20)

Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
The 7 steps of Machine Learning
The 7 steps of Machine LearningThe 7 steps of Machine Learning
The 7 steps of Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Case study on gina(gobal innovation network and analysis)
Case study on gina(gobal innovation network and analysis)Case study on gina(gobal innovation network and analysis)
Case study on gina(gobal innovation network and analysis)
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Data science
Data science Data science
Data science
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 

Semelhante a Wtf is data science?

Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
Natalino Busa
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 

Semelhante a Wtf is data science? (20)

Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 
Adopting innovation
Adopting innovationAdopting innovation
Adopting innovation
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
training_presentation
training_presentationtraining_presentation
training_presentation
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
100_Days_of_Data_Science
100_Days_of_Data_Science100_Days_of_Data_Science
100_Days_of_Data_Science
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Wtf is data science?

  • 1. WTF is Data Science? Dylan Gregersen OpenWest 2018
  • 2. My name is Dylan Gregersen I like these things... You can find me at… dylangregersen I am the lead data scientist at...
  • 3. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline Data Science is the process of collecting, cleaning, analyzing, visualizing, and communicating data in order to solve problems in the real world. Data science is...
  • 4. What people think data science is... People often think data science is all about mathematics, algorithms, and something call “machine learning” Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  • 5. What most data science is... Data science actually consists mostly of data collection, cleaning, and organization (often 80% of the work) Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  • 6. What people forget that data science is People tend to forget the skills needed in data science to communicate results so someone can take an action in the real worldRachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  • 7. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline Data science is a process When doing data science we... 1. Conceptual Data Model: Collect data and create a conceptual data model of real world phenomena 2. Understand the data: We use that data model to understand something about the phenomena 3. Solve a Problem: We apply that understanding to solve a problem 4. Take action: Ultimately, we succeed when our solution leads to actions
  • 8. Data science is successful when you learn something about the real world which helps you solve a problem by taking an action.
  • 9. Data science is successful when you learn something about the real world which helps you solve a problem by taking an action. Example: What is my conference room utilization?
  • 10. Identifying the problem U: What is my conference room utilization?
  • 11. Identifying the problem U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which rooms are underutilized Me: Why do you want to know? U: To improve the efficiency of conference rooms use Me: What are you going to do with that information? A: Repurpose rooms who’s meeting usage is less than 50%
  • 12. Problem: Conference rooms should be used efficiently Action: repurpose rooms with usage less than 50%, also heavily used areas Metric: room utilization = hours in use / available hours per day Identifying the problem U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which rooms are underutilized Me: Why do you want to know? U: To improve the efficiency of conference rooms use Me: What are you going to do with that information? A: Repurpose rooms who’s meeting usage is less than 50%
  • 13. What problem are you trying to solve? What action will you take with this number?
  • 14. Problem: Change meeting rooms to fit the needs of department Action: make purchasing decisions about technology or furniture Metrics: room utilization, organizer’s department, occupancy size, technology or furniture used U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which departments are using the rooms the most. Me: Why do you want to know? U: To adjust the rooms to meet their needs Me: What are you going to do with that information? A: Buy new technology or furniture to better meet those needs Identifying the problem
  • 15. Solving the Problem Start by figuring out a plan. 1. Conceptual Data Model: Collect data and create a conceptual data model of real world phenomena 2. Understand the data: We use that data model to understand something about the phenomena 3. Solve a Problem: We apply that understanding to solve a problem 4. Take action: Ultimately, we succeed when our solution leads to actions
  • 16. Solving the Problem Start by figuring out a plan. Document requirements and get feedback from your end user Problem: Conference rooms should be used efficiently Action: repurpose rooms with usage less than 50%, also heavily used areas Metric: room utilization = hours in use / available hours per day
  • 17. Solving the Problem Having a plan... ● Helps you stay focused ● Helps you communicate with your end users ● Build in things you’ll need in production: data quality, alerts, testing, security, code reviews
  • 18. Solving the Problem Now with a plan 1. Conceptual Data Model: Collect data and create a conceptual data model of real world phenomena 2. Understand the data: We use that data model to understand something about the phenomena 3. Solve a Problem: We apply that understanding to solve a problem 4. Take action: Ultimately, we succeed when our solution leads to actions
  • 19. Collect data and create a conceptual data model of real world phenomena Small project you might use python and store in a folder called “raw_data” Large project you might use python+kafka and store in AWS S3 { …. "id": "6iunsmr8qv1k1c5avlek045oup", "iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com", "summary": "OpenWest: WTF is data science?", "status": "confirmed", "start": { "dateTime": "2018-06-08T11:30:00-06:00" }, "end": { "dateTime": "2018-06-08T12:30:00-06:00" }, …. } Metadata: room_id, customer_id, time_range Google Event File 1. Conceptual Data Model
  • 20. 80% of data science work is cleaning and structuring the data. Small project you might use python to process “raw_data” into “processed_data” Large project you might use AWS Glue to process AWS S3 data and store in AWS Redshift { …. "id": "6iunsmr8qv1k1c5avlek045oup", "iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com", "summary": "OpenWest: WTF is data science?", "status": "confirmed", "start": { "dateTime": "2018-06-08T11:30:00-06:00" }, "end": { "dateTime": "2018-06-08T12:30:00-06:00" }, …. } Metadata: room_id, customer_id, time_range Google Event File 1. Conceptual Data Model
  • 21. 80% of data science work is cleaning and structuring the data. 1. Conceptual Data Model { …. "id": "6iunsmr8qv1k1c5avlek045oup", "iCalUID": "6iunsmr8qv1k1c5avlek045oup@google.com", "summary": "OpenWest: WTF is data science?", "status": "confirmed", "start": { "dateTime": "2018-06-08T11:30:00-06:00" }, "end": { "dateTime": "2018-06-08T12:30:00-06:00" }, …. } Metadata: room_id, customer_id, time_range Google Event File INSERT INTO customer 1212 AS customer_id INSERT INTO room 42 AS room_id 1212 AS customer_id INSERT INTO event "6iunsmr8qv1k1c5avlek045oup" AS event_id “2018-06-08T17:30:00Z” AS event_start_utc 3600.0 AS event_duration “confirmed” AS event_status INSERT INTO fact_room_event room_id event_id Structured Data - Star Schema
  • 22. We use that data model to understand something about the phenomena 2. Understand the Data
  • 23. Explore, manipulate the data. Question the data quality and return to cleaning if necessary. Small project you might use python to load “processed_data” and make plots Large project you might use SQL to query AWS Redshift and use python to visualize 2. Understand the Data INSERT INTO customer 1212 AS customer_id INSERT INTO room 42 AS room_id 1212 AS customer_id INSERT INTO event "6iunsmr8qv1k1c5avlek045oup" AS event_id “2018-06-08T17:30:00Z” AS event_start_utc 3600.0 AS event_duration “confirmed” AS event_status INSERT INTO fact_room_event room_id event_id Structured Data - Star Schema
  • 24. Explore, manipulate the data. Question the data quality and return to cleaning if necessary. 2. Understand the Data INSERT INTO customer 1212 AS customer_id INSERT INTO room 42 AS room_id 1212 AS customer_id INSERT INTO event "6iunsmr8qv1k1c5avlek045oup" AS event_id “2018-06-08T17:30:00Z” AS event_start_utc 3600.0 AS event_duration “confirmed” AS event_status INSERT INTO fact_room_event room_id event_id Structured Data - Star Schema
  • 25. 3. Solve a Problem We apply that understanding to solve a problem
  • 26. 3. Solve a Problem We apply that understanding to solve a problem Problem: Conference rooms should be used efficiently Action: repurpose rooms with usage less than 50%, also heavily used areas Metric: room utilization = hours in use / available hours per day
  • 27. 3. Solve a Problem Did we solve the problem? What action are you going to take? Problem: Conference rooms should be used efficiently Action: repurpose rooms with usage less than 50%, also heavily used areas Metric: room utilization = hours in use / available hours per day
  • 28. 3. Take Action Ultimately, we succeed when our solution leads to actions
  • 29. 3. Take Action Ultimately, we succeed when our solution leads to actions Small project might periodically recreate to allow user to take new actions. Large project you might provide a tool for the user to recreate on their own.
  • 30. 3. Take Action Ultimately, we succeed when our solution leads to actions In our example, our Facilities Gal goes and looks at the bottom three rooms. Decides that Camp Ivanhoe really isn’t needed. She also checks Fire Swamp and asks some people why it is used so much.
  • 31. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline Data science is a process 1. Collect data and create a conceptual data model of real world phenomena 2. We use that data model to understand something about the phenomena 3. We apply that understanding to solve a problem 4. Ultimately, we succeed when our solution leads to actions
  • 32. ? ? ? Cool! So what about machine learning and predictive modeling?
  • 33. The data science process has a hierarchy of needs
  • 34. Data Basics The data science hierarchy of needs describes the stages of data complexity and insights
  • 35. The Data Science Process
  • 36. The Data Science Process
  • 37. The Data Science Process
  • 38. The Data Science Process
  • 39. First point of value Descriptive Analytics are your first stage where you can actually answer questions. Especially important for business end users who want the results of your data.
  • 40. First point of value Businesses spend 1-3 months to get this into production the first time They spend 1-3 years to really get this right Descriptive Analytics are your first stage where you can actually answer questions.
  • 41. Businesses spend 1-3 months to get this into production the first time They spend 1-3 years to really get this right 1-2 years to do this well 1-2 years integrate these 1+ years grow modeling to optimizations
  • 42. In the real world, data science is a team activity
  • 43. Data-Driven Companies Build Data Science Teams Data Engineer Data Architect Data Analyst Developer Product Manager QA Statistician Chief Data Officer Senior Data Analyst Data Steward Data Engineer Business Analyst
  • 44. Myth of the data scientist Data science requires many different jobs and skills. Being a “data scientist” is very much like being a “full stack developer”. The most data-driven companies are creating data specific jobs: data engineers, data architects, data analysts, data researchers.
  • 45. How do you get started?
  • 46. Start with descriptive analytics Best way to build your intuition about the data science process works. Become good at identifying the root question, problem to solve, and the possible actions to be taken.
  • 47. Start with descriptive analytics Best way to build your intuition about the data science process works. Become good at identifying the root question, problem to solve, and the possible actions to be taken. Open Data Sets: www.kaggle.com/datasets www.data.gov www.github.com/awesomedata/awesome-public-datasets www.google.com/search?q=open+data+sets
  • 48. Start with descriptive analytics Best way to build your intuition about the data science process works. Become good at identifying the root question, problem to solve, and the possible actions to be taken. ● The best tools are powerful. ● The best tools are easy to use and learn. ● The best tools support teamwork. ● The best tools are beloved by the community. Excel is still a standard across the data world and is a perfectly fine way to get started.
  • 49. Data science is successful when you learn something about the real world which helps you solve a problem by taking an action. You set yourself for success if you... ● Foster a determination to discover the underlying problems to solve ● Learn to work with data What is data science?
  • 50. References and Resources ● Rachel Schutt & Cathy O’Neil (2013) Doing Data Science: Straight Talk From the Frontline, Sebastopol, CA: O’Reilly ● DJ Patil & Hilary Mason (2015) Data Driven. Sebastopol, CA: O’Reilly ● DJ Patil (2011) Building Data Science Teams. Sebastopol, CA: O’Reilly ● Monica Rogati (2017) The AI Hierarchy of Needs ● Nick Crocker (2014) Thirty Things I’ve Learned ● Tavish Srivastava (2015) 13 Tips to make you awesome in Data Science / Analytics Jobs ● Daniel Tunkelang (2017) 10 Things Everyone Should Know About Machine Learning ● DJ Patil - Everything We Wish We'd Known About Building Data Products
  • 51. Data science is successful when you learn something about the real world which helps you solve a problem by taking an action. You set yourself for success if you... ● Foster a determination to discover the underlying problems to solve ● Learn to work with data Thank You!