SlideShare a Scribd company logo
1 of 64
Download to read offline
Pay no attention to
the man behind the
curtain…
The unseen work behind data
science and analytics
Accelerate Data Science conference
October 18, 2017
Mark Madsen
www.ThirdNature.net
@markmadsen
Copyright Third Nature, Inc.
INTRO
The problem we’re (really) trying to solve, current state
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
The focus is largely on machine learning today
You are here
Copyright Third Nature, Inc.
The craft model of information delivery does not scale
Copyright Third Nature, Inc.
So we shifted to data publishing
Industrialized data delivery for self-service access.
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Increased data capture and BI maturity leads to
more data-intensive practices, rising complexity
Pareto analysis of the share of buyers who make up 80%
of sales volume for products, in this case Coke.
Data source: CMO council
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
What makes these customers different? How does
this affect a new product launch, or line extensions?
These are not the
type of questions
you can answer
with only queries
and reporting.
Data source: CMO council
Copyright Third Nature, Inc.
Compounding the problem: observations, not transactions
Event data doesn’t fit well with current methods of collection and
storage, or with the technology to process and analyze it.
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
The old problem was access, the new one is analysis
Copyright Third Nature, Inc.
The applied view of data science
Five basic things you can do:
▪Prediction – what is most likely to happen?
▪Estimation – what’s the future value of a variable?
▪Description – what relationships exist in the data?
▪Simulation – what could happen?
▪Prescription – what should you do?
Slide 10
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Applying analytics isn’t just putting them on a screen
There are different models of use at machine and human speed
Decision-
Action
Human
decision
support
Humans
moderating
machine
decisions
Machine
decisions
Monitor-
Alert
Human
monitoring
Machine
monitoring
Copyright Third Nature, Inc.
THE NATURE OF THE PROBLEM FOR
ORGANIZATIONS
Implementing data science is a problem of multiple perspectives
Copyright Third Nature, Inc.
We don’t have an analytics problem, just like we
didn’t have a BI problem
The origin of analytics as “business
intelligence” was stated well in 1958:
…the ability to apprehend the
interrelationships of presented facts in
such a way as to guide action towards a
desired goal. ~ H. P. Luhn
“A Business Intelligence System”, http://altaplana.com/ibmrd0204H.pdf
”
“
Our goal is analytics as a capability, not a technology
Copyright Third Nature, Inc.
Three constituencies
Stakeholder Analyst Builder
aka the recipient aka the data scientist aka the engineer
Copyright Third Nature, Inc.
Starting points
Many organizations choose to start with
the analysts. Create a data science team.
Turn them loose to find a problem.
Many more start with builders: technology
solutions looking for problems, e.g. 55% of
the IT driven Hadoop and Spark projects
over the last five years.
The right place to start? Stakeholders. The
goal to achieve, the problem to solve.
Copyright Third Nature, Inc.
NATURE OF THE PROBLEM FROM
THE STAKEHOLDER’S PERSPECTIVE
Each constituency has their own set of problems to deal with
Copyright Third Nature, Inc.
The myth that still drives analytics – analytic gold
All we need is a fat
pipe and pans
working in parallel…
Copyright Third Nature, Inc.
Analytic insights that result in no action are expensive trivia.
It’s not the insight, but what you do with it, that matters
As a manager: what would you do in this situation?
Copyright Third Nature, Inc.
Perennially difficult: What question do you address?
What’s possible?
How do you know what’s
feasible and what isn’t? (both
technically and financially)
You don’t, unless you know the
data science and the business
(and even then maybe not, ML
makes no guarantees)
It takes domain expertise and
analytic expertise and intuition
- that’s why you need analysts.
Copyright Third Nature, Inc.
Important questions for managers
1. What is the goal?
2. Is the goal worth achieving?
3. Do you have a clearly stated, measureable goal?
4. Do you have the data required?
If they don’t realize this is important, they complain about
analysts asking them a bunch of (obvious*) questions.
There are processes you can put in place to find problems
to address, prioritize them and determine how to deploy
the solutions for them.
*Not really
Copyright Third Nature, Inc.
Applying analytics is not an analytics problem
Applying analytics is not in the
analyst’s control.
It’s not in the engineer’s control.
It’s in the control of the people
involved in the process.
Failures are often in execution, not
in analytics development.
For example, we saw unexpectedly
poor performance in a number of
geographies. Was it the new
analytics we tried? Was it a data
problem? No, it was a simple
compliance problem.
Copyright Third Nature, Inc.
NATURE OF THE PROBLEM FROM
THE ANALYST’S PERSPECTIVE
Copyright Third Nature, Inc.
The analytics process at a high level
Diagram: Kate Matsudaira
Copyright Third Nature, Inc.
The nature of analytics problems is researching the
unknown rather than accessing the known.
Repeat for each new problem
Diagram: Kate Matsudaira
Copyright Third Nature, Inc.
Important: no two analytics projects are entirely alike
Different goals = different data, preparation, algorithm
Different algorithms have different resource consumption
profiles and scaling ability.
Each requires it’s own custom engineered data features
Copyright Third Nature, Inc.
Starting at the start: Do you have a clearly stated,
measureable goal?
Copyright Third Nature, Inc.
The main hurdle: just getting the data
Do you know where to find it? Because it’s
unlikely to be in the data warehouse.
Do you have access to it?
Is access fast enough? Because DWs are for
QRD, not for moving huge piles of data. And
ERP systems and SaaS apps are right out.
Copyright Third Nature, Inc.
Do you have the right data?
Many machine learning
techniques require labeled
(known good) training data:
Supervised learning: a person
has to define the correct
output for some portion of
the data. Data is divided into
training sets used for model
building and test sets for
validating the results.
• What is spam and what isn’t?
• What does a fraudulent
transaction look like
28
Copyright Third Nature, Inc.
Do you have enough of the right data?
ML needs a lot, you may be disappointed in your own efforts
Copyright Third Nature, Inc.
Define the
business problem
Translate the
problem into an
analytic context Select
appropriate data
Learn the data
Create a model
set
Fix problems
with data
Transform data
Build models
Assess models
Deploy models
Assess results
Source: Michael Berry, Data Miners Inc.
Slide 30Copyright Third Nature, Inc.
What does an expert analyst really do?
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
What does an expert analyst do?
You can’t model data for this in advance.
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Where do analysts spend their time? mostly data work
Define the
business problem
Translate the
problem into an
analytic context
Select appropriate
data
Learn the data
Create a model set
Fix problems with
data
Transform data
Build models
Assess models
Deploy models
Assess results
% of time spent
70% 30%
Source: Michael Berry, Data Miners Inc.
Slide 32
Copyright Third Nature, Inc.
Feature engineering is the core of the process
Lots of data (as attributes) makes things harder
Lots of data (instances) makes things slow
Often, the raw data is not in a form that is amenable
to learning, but you can construct features from it
that are.
Cleaning up data, choosing attributes, deriving
features is not a technical problem as much as a
creative one.
The best way to enable data scientists is to remove
data management obstacles.
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Where do most of the analytics tools focus?
Define the
business problem
Translate the
problem into an
analytic context Select
appropriate data
Learn the data
Create a model
set
Fix problems
with data
Transform data
Build models
Assess models
Deploy models
Assess results
Source: Michael Berry, Data Miners Inc.
Slide 34
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Where do most of the analytics aaS focus?
Define the
business problem
Translate the
problem into an
analytic context Select
appropriate data
Learn the data
Create a model
set
Fix problems
with data
Transform data
Build models
Assess models
Deploy models
Assess results
Source: Michael Berry, Data Miners Inc.
Slide 35
Copyright Third Nature, Inc.
The analyst’s workspace in BI is relatively spare
Copyright Third Nature, Inc.
The analyst’s workspace needs to be more like a
kitchen than like BI vending machines
Copyright Third Nature, Inc.
NATURE OF THE PROBLEM FROM
THE BUILDER’S PERSPECTIVE
Copyright Third Nature, Inc.
IT and Ops people want to know “what to build?”
Giant data platform? Self service tools?
Copyright Third Nature, Inc.
Analytics requires different processes and workloads
None of this analytics work
is the same as what IT
considered “analysis” to be,
which is usually equated
with BI or ad-hoc query.
Ad-hoc analysis =
Exploratory data analysis =
Batch analytics =
Real-time analytics
A real analytics production workflow
Hatch, CIKM ‘11 Slide 40
Copyright Third Nature, Inc.
Embedding analytics: less voodoo, more engineering
Copyright Third Nature, Inc.
Things engineering and operations worry about
Engineering time and effort
▪ Introduction of new technology, complexity
▪ Integration - Deployment of models requirements linking different types of
environments, creating supportable workflows for the analysts
▪ Ability to develop and deploy at the required speed
Supportability
▪ Automation
▪ The environment requires additional monitoring, other technology and
processes, particularly for customer-facing work
▪ Support costs (time and money)
SLAs:
▪ Availability – if analytics are tied to production operations, particularly
customer facing, this becomes important and difficult because it’s not
standard application work
▪ Performance and scalability – have to manage unpredictable workloads,
resource conflicts between model development with model execution
Copyright Third Nature, Inc.
The world changes, do the models?
In BI you maintain ETL and
schemas, in ML you maintain
models.
“Model decay” happens as the
assumptions around which a
model is built change, e.g. spam
techniques change.
When you adjust the model you
need to know it is better again
▪ Better save the data used to
build the model
▪ Better save the model
▪ Baseline and measurements
Copyright Third Nature, Inc.
You need a system of record for analytics
Copyright Third Nature, Inc.
THREE PERSPECTIVES, ONE SOLUTION?
There are requirements from all constituents. You need to put them
together to have a complete picture of what’s needed.
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
The missing stakeholder
There is another stakeholder:
analytics management - the
CAO, CDO, VP of analytics, aka
“your boss” if you’re a data
scientist.
The perspective and problems
of the person responsible for
oversight of the team and
efforts is across the
organization and across
multiple projects
Copyright Third Nature, Inc.
Repeatability
Copyright Third Nature, Inc.
Operational predictability
Copyright Third Nature, Inc.
Reproducibility
Copyright Third Nature, Inc.
Analytics solutions are interdisciplinary
Team composition is best
when the skills and
backgrounds are mixed.
Domain knowledge is still
valuable – ignore the AI and
ML hype saying that it’s all
math and engineering.
Data management and
engineering is a necessary
part for much of this work.
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Data scientists and engineers work from opposing directions
exploration
modeling
integration
applications
infrastructure
help people ask the right questions,
frame them, define measurable goals
define models that run to determine
answers or carry out actions
deliver the results / product in
production, at scale
build data science models into
applications and delivery systems
provide the systems and practices to
build and run the desired models
Diagram concept: Paco Nathan
Using a matrix to plan the project team
Image: Paco Nathan
This is a team sport, not a solo act
Image: Paco Nathan
Copyright Third Nature, Inc.
We already know the craft model doesn’t scale. How
do we industrialize like we did for BI?
Copyright Third Nature, Inc.Copyright Third Nature, Inc.
There is an extensive list of requirements to support
Primary requirements needed by constituents S D E
Data catalog and ability to search it for datasets X X
Self-service access to curated data X
Self-service access to uncurated (unknown, new) data X X
Temporary storage for working with data X
Data integration, cleaning, transformation, preparation tools and environment X X
Persistent storage for source data used by production models X X
Persistent storage for training, testing, production data used by models X X
Storage and management of models X X
Deployment, monitoring, decommissioning models X
Lineage, traceability of changes made for data used by models X X
Lineage, traceability for model changes X X X
Managing baseline data / metrics for comparing model performance X X X
Managing ongoing data / metrics for tracking ongoing model performance X X X
S = stakeholder, user, D = data scientist, analyst, E = engineer, developer
Copyright Third Nature, Inc.
Non-answer #1: “Innovation as Procurement”
Software vendors want to sell you
one thing: high margin software.
Most assume the data is there and
ready to use by their application –
just load it.
Most of the work lies in data
integration, cleaning and data
management.
Embedding analytics in a process
adds infrastructure that most
organizations don’t have and can’t
support. It takes new infrastructure.
Copyright Third Nature, Inc.
Non-answer #2: Best Practices
“78% of high performing
companies have a centralized data
science team in place in their
organization” – follow their lead!
This is called survival bias. Flipping
a coin is often as effective as “Do
what they did.”
The problem: you have directions
to cross a minefield but no map of
where to start.
Copyright Third Nature, Inc.
The enterprise focus needs to be on
repeatability - where it can be supported
Copyright Third Nature, Inc.
Key focus for the organization:
Infrastructure vs Application
Infrastructure enables value,
applications deliver value.
Enable applications by pushing
the reusable elements down
into the platform.
The infrastructure is a hidden
combination of technology,
process and methods.
Copyright Third Nature, Inc.
Data management is a key element of infrastructure
Multiple contexts of use, differing quality levels
You need to keep the original because just like baking,
you can’t unmake dough once it’s mixed.
Copyright Third Nature, Inc.
Manage your data
(or it will manage you)
Data management is where
both analysts and
developers are weakest.
Modern engineering
practices are where data
management is weakest.
You need to bridge the
groups and practices in the
organization if you want to
make this work repeatable.
Copyright Third Nature, Inc.
Conclusion: new stuff eventually becomes old stuff
Copyright Third Nature, Inc.
About the Presenter
Mark Madsen is president of Third
Nature, an advisory firm focused on
analytics, data and technology strategy.
Mark is an award-winning author,
architect and CTO who has received
awards for his work from the American
Productivity & Quality Center,
Smithsonian Institute and industry
associations.
He is an international speaker, a
contributor to Forbes, and member of
the O’Reilly Artificial Intelligence and
Strata program committees. For more
information or to contact Mark, follow
@markmadsen on Twitter or visit
http://ThirdNature.net
Copyright Third Nature, Inc.
About Third Nature
Third Nature is an advisory firm focused on practices and technology in
analytics, information strategy, business intelligence and data management.
Our goal is to help organizations solve problems using data. We offer
education, advisory and research services to support business and IT
organizations. We also provide product-related consulting to software
vendors in the data industry.
We specialize in strategy and architecture, so we look at emerging
technologies and markets, evaluating how technologies are applied to solve
problems rather than simply comparing product features. We fill the gap
between what industry analyst firms cover and what organizations need.

More Related Content

What's hot

Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudJaipaul Agonus
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!Dylan
 
AI Hierarchy of Needs
AI Hierarchy of NeedsAI Hierarchy of Needs
AI Hierarchy of NeedsDylan
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?Dylan
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleO'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleVasu S
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science teamAshish Bansal
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approachjoshwills
 

What's hot (20)

Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
AI Hierarchy of Needs
AI Hierarchy of NeedsAI Hierarchy of Needs
AI Hierarchy of Needs
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | QuboleO'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
O'Reilly ebook: Machine Learning at Enterprise Scale | Qubole
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 

Similar to Pay no attention to the man behind the curtain - the unseen work behind data science

Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfmallikarjuntalakal
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfikenossama03
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfvenkatakeerthi3
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics CapabilityBala Iyer
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlationVrushaliSolanke
 
Odgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperOdgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperRobertson Executive Search
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdxThinkful
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringRy Walker
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraVin Malhotra
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015Edgar Alejandro Villegas
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineDan Meyer
 
Five Hot Trends for 2018
Five Hot Trends for 2018Five Hot Trends for 2018
Five Hot Trends for 2018ibi
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17Thinkful
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sdThinkful
 
Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Pozzolini
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...Data Science Council of America
 

Similar to Pay no attention to the man behind the curtain - the unseen work behind data science (20)

Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics Capability
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Odgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperOdgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White Paper
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdx
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics Outline
 
Five Hot Trends for 2018
Five Hot Trends for 2018Five Hot Trends for 2018
Five Hot Trends for 2018
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sd
 
Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012Business Analytics Lesson Of The Day August 2012
Business Analytics Lesson Of The Day August 2012
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
 

More from mark madsen

A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...mark madsen
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good storymark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followersmark madsen
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousingmark madsen
 
Open Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing DataOpen Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing Datamark madsen
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousingmark madsen
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 
Big Data Wonderland: Two Views on the Big Data Revolution
Big Data Wonderland: Two Views on the Big Data RevolutionBig Data Wonderland: Two Views on the Big Data Revolution
Big Data Wonderland: Two Views on the Big Data Revolutionmark madsen
 
Using Data Virtualization to Integrate With Big Data
Using Data Virtualization to Integrate With Big DataUsing Data Virtualization to Integrate With Big Data
Using Data Virtualization to Integrate With Big Datamark madsen
 

More from mark madsen (20)

A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good story
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followers
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
 
Open Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing DataOpen Data: Free Data Isn't the Same as Freeing Data
Open Data: Free Data Isn't the Same as Freeing Data
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
Big Data Wonderland: Two Views on the Big Data Revolution
Big Data Wonderland: Two Views on the Big Data RevolutionBig Data Wonderland: Two Views on the Big Data Revolution
Big Data Wonderland: Two Views on the Big Data Revolution
 
Using Data Virtualization to Integrate With Big Data
Using Data Virtualization to Integrate With Big DataUsing Data Virtualization to Integrate With Big Data
Using Data Virtualization to Integrate With Big Data
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Pay no attention to the man behind the curtain - the unseen work behind data science

  • 1. Pay no attention to the man behind the curtain… The unseen work behind data science and analytics Accelerate Data Science conference October 18, 2017 Mark Madsen www.ThirdNature.net @markmadsen
  • 2. Copyright Third Nature, Inc. INTRO The problem we’re (really) trying to solve, current state
  • 3. Copyright Third Nature, Inc.Copyright Third Nature, Inc. The focus is largely on machine learning today You are here
  • 4. Copyright Third Nature, Inc. The craft model of information delivery does not scale
  • 5. Copyright Third Nature, Inc. So we shifted to data publishing Industrialized data delivery for self-service access.
  • 6. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Increased data capture and BI maturity leads to more data-intensive practices, rising complexity Pareto analysis of the share of buyers who make up 80% of sales volume for products, in this case Coke. Data source: CMO council
  • 7. Copyright Third Nature, Inc.Copyright Third Nature, Inc. What makes these customers different? How does this affect a new product launch, or line extensions? These are not the type of questions you can answer with only queries and reporting. Data source: CMO council
  • 8. Copyright Third Nature, Inc. Compounding the problem: observations, not transactions Event data doesn’t fit well with current methods of collection and storage, or with the technology to process and analyze it. Copyright Third Nature, Inc.
  • 9. Copyright Third Nature, Inc. The old problem was access, the new one is analysis
  • 10. Copyright Third Nature, Inc. The applied view of data science Five basic things you can do: ▪Prediction – what is most likely to happen? ▪Estimation – what’s the future value of a variable? ▪Description – what relationships exist in the data? ▪Simulation – what could happen? ▪Prescription – what should you do? Slide 10 Copyright Third Nature, Inc.
  • 11. Copyright Third Nature, Inc. Applying analytics isn’t just putting them on a screen There are different models of use at machine and human speed Decision- Action Human decision support Humans moderating machine decisions Machine decisions Monitor- Alert Human monitoring Machine monitoring
  • 12. Copyright Third Nature, Inc. THE NATURE OF THE PROBLEM FOR ORGANIZATIONS Implementing data science is a problem of multiple perspectives
  • 13. Copyright Third Nature, Inc. We don’t have an analytics problem, just like we didn’t have a BI problem The origin of analytics as “business intelligence” was stated well in 1958: …the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal. ~ H. P. Luhn “A Business Intelligence System”, http://altaplana.com/ibmrd0204H.pdf ” “ Our goal is analytics as a capability, not a technology
  • 14. Copyright Third Nature, Inc. Three constituencies Stakeholder Analyst Builder aka the recipient aka the data scientist aka the engineer
  • 15. Copyright Third Nature, Inc. Starting points Many organizations choose to start with the analysts. Create a data science team. Turn them loose to find a problem. Many more start with builders: technology solutions looking for problems, e.g. 55% of the IT driven Hadoop and Spark projects over the last five years. The right place to start? Stakeholders. The goal to achieve, the problem to solve.
  • 16. Copyright Third Nature, Inc. NATURE OF THE PROBLEM FROM THE STAKEHOLDER’S PERSPECTIVE Each constituency has their own set of problems to deal with
  • 17. Copyright Third Nature, Inc. The myth that still drives analytics – analytic gold All we need is a fat pipe and pans working in parallel…
  • 18. Copyright Third Nature, Inc. Analytic insights that result in no action are expensive trivia. It’s not the insight, but what you do with it, that matters As a manager: what would you do in this situation?
  • 19. Copyright Third Nature, Inc. Perennially difficult: What question do you address? What’s possible? How do you know what’s feasible and what isn’t? (both technically and financially) You don’t, unless you know the data science and the business (and even then maybe not, ML makes no guarantees) It takes domain expertise and analytic expertise and intuition - that’s why you need analysts.
  • 20. Copyright Third Nature, Inc. Important questions for managers 1. What is the goal? 2. Is the goal worth achieving? 3. Do you have a clearly stated, measureable goal? 4. Do you have the data required? If they don’t realize this is important, they complain about analysts asking them a bunch of (obvious*) questions. There are processes you can put in place to find problems to address, prioritize them and determine how to deploy the solutions for them. *Not really
  • 21. Copyright Third Nature, Inc. Applying analytics is not an analytics problem Applying analytics is not in the analyst’s control. It’s not in the engineer’s control. It’s in the control of the people involved in the process. Failures are often in execution, not in analytics development. For example, we saw unexpectedly poor performance in a number of geographies. Was it the new analytics we tried? Was it a data problem? No, it was a simple compliance problem.
  • 22. Copyright Third Nature, Inc. NATURE OF THE PROBLEM FROM THE ANALYST’S PERSPECTIVE
  • 23. Copyright Third Nature, Inc. The analytics process at a high level Diagram: Kate Matsudaira
  • 24. Copyright Third Nature, Inc. The nature of analytics problems is researching the unknown rather than accessing the known. Repeat for each new problem Diagram: Kate Matsudaira
  • 25. Copyright Third Nature, Inc. Important: no two analytics projects are entirely alike Different goals = different data, preparation, algorithm Different algorithms have different resource consumption profiles and scaling ability. Each requires it’s own custom engineered data features
  • 26. Copyright Third Nature, Inc. Starting at the start: Do you have a clearly stated, measureable goal?
  • 27. Copyright Third Nature, Inc. The main hurdle: just getting the data Do you know where to find it? Because it’s unlikely to be in the data warehouse. Do you have access to it? Is access fast enough? Because DWs are for QRD, not for moving huge piles of data. And ERP systems and SaaS apps are right out.
  • 28. Copyright Third Nature, Inc. Do you have the right data? Many machine learning techniques require labeled (known good) training data: Supervised learning: a person has to define the correct output for some portion of the data. Data is divided into training sets used for model building and test sets for validating the results. • What is spam and what isn’t? • What does a fraudulent transaction look like 28
  • 29. Copyright Third Nature, Inc. Do you have enough of the right data? ML needs a lot, you may be disappointed in your own efforts
  • 30. Copyright Third Nature, Inc. Define the business problem Translate the problem into an analytic context Select appropriate data Learn the data Create a model set Fix problems with data Transform data Build models Assess models Deploy models Assess results Source: Michael Berry, Data Miners Inc. Slide 30Copyright Third Nature, Inc. What does an expert analyst really do?
  • 31. Copyright Third Nature, Inc.Copyright Third Nature, Inc. What does an expert analyst do? You can’t model data for this in advance.
  • 32. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Where do analysts spend their time? mostly data work Define the business problem Translate the problem into an analytic context Select appropriate data Learn the data Create a model set Fix problems with data Transform data Build models Assess models Deploy models Assess results % of time spent 70% 30% Source: Michael Berry, Data Miners Inc. Slide 32
  • 33. Copyright Third Nature, Inc. Feature engineering is the core of the process Lots of data (as attributes) makes things harder Lots of data (instances) makes things slow Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. Cleaning up data, choosing attributes, deriving features is not a technical problem as much as a creative one. The best way to enable data scientists is to remove data management obstacles.
  • 34. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Where do most of the analytics tools focus? Define the business problem Translate the problem into an analytic context Select appropriate data Learn the data Create a model set Fix problems with data Transform data Build models Assess models Deploy models Assess results Source: Michael Berry, Data Miners Inc. Slide 34
  • 35. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Where do most of the analytics aaS focus? Define the business problem Translate the problem into an analytic context Select appropriate data Learn the data Create a model set Fix problems with data Transform data Build models Assess models Deploy models Assess results Source: Michael Berry, Data Miners Inc. Slide 35
  • 36. Copyright Third Nature, Inc. The analyst’s workspace in BI is relatively spare
  • 37. Copyright Third Nature, Inc. The analyst’s workspace needs to be more like a kitchen than like BI vending machines
  • 38. Copyright Third Nature, Inc. NATURE OF THE PROBLEM FROM THE BUILDER’S PERSPECTIVE
  • 39. Copyright Third Nature, Inc. IT and Ops people want to know “what to build?” Giant data platform? Self service tools?
  • 40. Copyright Third Nature, Inc. Analytics requires different processes and workloads None of this analytics work is the same as what IT considered “analysis” to be, which is usually equated with BI or ad-hoc query. Ad-hoc analysis = Exploratory data analysis = Batch analytics = Real-time analytics A real analytics production workflow Hatch, CIKM ‘11 Slide 40
  • 41. Copyright Third Nature, Inc. Embedding analytics: less voodoo, more engineering
  • 42. Copyright Third Nature, Inc. Things engineering and operations worry about Engineering time and effort ▪ Introduction of new technology, complexity ▪ Integration - Deployment of models requirements linking different types of environments, creating supportable workflows for the analysts ▪ Ability to develop and deploy at the required speed Supportability ▪ Automation ▪ The environment requires additional monitoring, other technology and processes, particularly for customer-facing work ▪ Support costs (time and money) SLAs: ▪ Availability – if analytics are tied to production operations, particularly customer facing, this becomes important and difficult because it’s not standard application work ▪ Performance and scalability – have to manage unpredictable workloads, resource conflicts between model development with model execution
  • 43. Copyright Third Nature, Inc. The world changes, do the models? In BI you maintain ETL and schemas, in ML you maintain models. “Model decay” happens as the assumptions around which a model is built change, e.g. spam techniques change. When you adjust the model you need to know it is better again ▪ Better save the data used to build the model ▪ Better save the model ▪ Baseline and measurements
  • 44. Copyright Third Nature, Inc. You need a system of record for analytics
  • 45. Copyright Third Nature, Inc. THREE PERSPECTIVES, ONE SOLUTION? There are requirements from all constituents. You need to put them together to have a complete picture of what’s needed.
  • 46. Copyright Third Nature, Inc.Copyright Third Nature, Inc. The missing stakeholder There is another stakeholder: analytics management - the CAO, CDO, VP of analytics, aka “your boss” if you’re a data scientist. The perspective and problems of the person responsible for oversight of the team and efforts is across the organization and across multiple projects
  • 47. Copyright Third Nature, Inc. Repeatability
  • 48. Copyright Third Nature, Inc. Operational predictability
  • 49. Copyright Third Nature, Inc. Reproducibility
  • 50. Copyright Third Nature, Inc. Analytics solutions are interdisciplinary Team composition is best when the skills and backgrounds are mixed. Domain knowledge is still valuable – ignore the AI and ML hype saying that it’s all math and engineering. Data management and engineering is a necessary part for much of this work.
  • 51. Copyright Third Nature, Inc.Copyright Third Nature, Inc. Data scientists and engineers work from opposing directions exploration modeling integration applications infrastructure help people ask the right questions, frame them, define measurable goals define models that run to determine answers or carry out actions deliver the results / product in production, at scale build data science models into applications and delivery systems provide the systems and practices to build and run the desired models Diagram concept: Paco Nathan
  • 52. Using a matrix to plan the project team Image: Paco Nathan
  • 53. This is a team sport, not a solo act Image: Paco Nathan
  • 54. Copyright Third Nature, Inc. We already know the craft model doesn’t scale. How do we industrialize like we did for BI?
  • 55. Copyright Third Nature, Inc.Copyright Third Nature, Inc. There is an extensive list of requirements to support Primary requirements needed by constituents S D E Data catalog and ability to search it for datasets X X Self-service access to curated data X Self-service access to uncurated (unknown, new) data X X Temporary storage for working with data X Data integration, cleaning, transformation, preparation tools and environment X X Persistent storage for source data used by production models X X Persistent storage for training, testing, production data used by models X X Storage and management of models X X Deployment, monitoring, decommissioning models X Lineage, traceability of changes made for data used by models X X Lineage, traceability for model changes X X X Managing baseline data / metrics for comparing model performance X X X Managing ongoing data / metrics for tracking ongoing model performance X X X S = stakeholder, user, D = data scientist, analyst, E = engineer, developer
  • 56. Copyright Third Nature, Inc. Non-answer #1: “Innovation as Procurement” Software vendors want to sell you one thing: high margin software. Most assume the data is there and ready to use by their application – just load it. Most of the work lies in data integration, cleaning and data management. Embedding analytics in a process adds infrastructure that most organizations don’t have and can’t support. It takes new infrastructure.
  • 57. Copyright Third Nature, Inc. Non-answer #2: Best Practices “78% of high performing companies have a centralized data science team in place in their organization” – follow their lead! This is called survival bias. Flipping a coin is often as effective as “Do what they did.” The problem: you have directions to cross a minefield but no map of where to start.
  • 58. Copyright Third Nature, Inc. The enterprise focus needs to be on repeatability - where it can be supported
  • 59. Copyright Third Nature, Inc. Key focus for the organization: Infrastructure vs Application Infrastructure enables value, applications deliver value. Enable applications by pushing the reusable elements down into the platform. The infrastructure is a hidden combination of technology, process and methods.
  • 60. Copyright Third Nature, Inc. Data management is a key element of infrastructure Multiple contexts of use, differing quality levels You need to keep the original because just like baking, you can’t unmake dough once it’s mixed.
  • 61. Copyright Third Nature, Inc. Manage your data (or it will manage you) Data management is where both analysts and developers are weakest. Modern engineering practices are where data management is weakest. You need to bridge the groups and practices in the organization if you want to make this work repeatable.
  • 62. Copyright Third Nature, Inc. Conclusion: new stuff eventually becomes old stuff
  • 63. Copyright Third Nature, Inc. About the Presenter Mark Madsen is president of Third Nature, an advisory firm focused on analytics, data and technology strategy. Mark is an award-winning author, architect and CTO who has received awards for his work from the American Productivity & Quality Center, Smithsonian Institute and industry associations. He is an international speaker, a contributor to Forbes, and member of the O’Reilly Artificial Intelligence and Strata program committees. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net
  • 64. Copyright Third Nature, Inc. About Third Nature Third Nature is an advisory firm focused on practices and technology in analytics, information strategy, business intelligence and data management. Our goal is to help organizations solve problems using data. We offer education, advisory and research services to support business and IT organizations. We also provide product-related consulting to software vendors in the data industry. We specialize in strategy and architecture, so we look at emerging technologies and markets, evaluating how technologies are applied to solve problems rather than simply comparing product features. We fill the gap between what industry analyst firms cover and what organizations need.