SlideShare uma empresa Scribd logo
1 de 16
DATA SCIENTISTS AND ANALYSTS
ARE ALSO SOFTWARE ENGINEERS
W.Whipple Neely
Director of Data Science, EA
THIS TALK IS ABOUT …..
Moving data science and analytics teams to a software development
model.
• The motivation is so that we can created repeatable, verifiable
processes.
• It also means that we can bring powerful but “personal” analysis
environments (such as R) into producing enterprise level systems, to
create work that typical dashboarding systems cannot achieve.
• In many ways this is a story about one set of teams, it may not apply
to all groups, but it has helped ours.
THE TYPICAL VENN DIAGRAM: WHO IS A DATA SCIENTIST
Statistics
SomeVersion
of Domain
Expertise
Computer
Science
“hacker skills”
Data
Science
“What kind of person does all this?
What abilities make a data scientist
successful?Think of him or her as a
hybrid of data hacker, analyst,
communicator, and trusted adviser.”
Davenport and Patil, Data Scientist: The
Sexiest Job of the 21st Century , Harvard
Business Review, 2012
“Hacker skills” is the wrong term
Click to add call out
GOOGLE IMAGE SEARCH: “WHO DATA SCIENTIST VENN DIAGRAM”
WHAT WE DO INSTEAD OF WHO WE ARE
Engineering
CollaborationScience
Data
Science
data engineering, coding
discipline, software
engineering, style guides
reproducibility, source code
control, regression tests
math, stats, computer science,
machine learning, probability
models, economics,
“substantive domain
expertise”, vast quantities of
common sense
Rules of engagement,
empathy, communication
and listening skills,
flexibility, reliability,
extreme social skills
THE PROBLEMS
We have a team of data scientists who are experts at probability modeling,
machine learning, and a few of them are pretty good at programming in R,
Matlab or Python on a laptop. However …
1. Most have no experience of team programming
2. Many come without experience of creating software that others can use, or
that is robust enough of to run
3. Creating an enterprise-level repeatable process can’t be left to the kind of
programming that most of us do on our laptops
4. There is no easy intermediate step between working on a laptop and
something that works on the enterprise platform.
WHERE WE STARTED
Write R or
Python Script
Run Script
Manually
Update
Report
Write R or
Python Script
Run Script
Manually
Update A Static
Model
Implementation
OR
THE PROBLEMS WITH WHERE WE
STARTED
• Code/methods/models got lost.
• Lots of manual work.
• No automated checks for correctness or robustness of
models or predictions.
WE TALKED TO THE TEAMS ABOUT WHAT
WAS WRONG
“Our analysts are pretty good at writing scripts and generating
reports, but our team needs help with the bookends: scheduling
tasks and serving the reports automatically” – Colleen Chrisco,
Director of Analytics, PopCap Games
IN TERMS OF OUR DIAGRAM
Engineering
CollaborationScience
Data
Science
data engineering, coding
discipline, software
engineering, style guides
reproducibility, source code
control, regression tests
math, stats, computer science,
machine learning, probability
models, economics,
“substantive domain
expertise”, vast quantities of
common sense
Rules of engagement,
empathy, communication
and listening skills,
flexibility, reliability,
extreme social skills
Click to add call out
THIS WAS A LITTLE SCARY FOR SOME OF OUR TEAMS ….
We’re not
programmers.
I don’t even know
where to start
I’ve never
scheduled a job
before.
Click to add call out
SO, TO ANSWER THESE CONCERNS WE
DID THE FOLLOWING…
Perforce R Server
Script Inputs:
csv, DBs, URL, logs,
RDS
Script Outputs:
csv, DBs, email, doc,
pdf, html, shiny, RDS
1. Check in Code
P4V, R-Checkin
2. Submit Job
Schedule file, API, Web
3. Run Script
Reporting, Models,
ETLs, Forecasting
R Script
By “we did the following’ I really mean that we hired a brilliant computer
scientist named Ben Weber who became part of the team. Ben learned
the workflows of the team members and created this system for us.
WHERE IT LANDED US
• We’d automated.
• We’d gotten the “bookends” covered.
• Many analytics teams, including the data science team are using the
system.
As a result …
• Teams started using the technology to improve their work
• Teams became more efficient: “I no longer have to be a walking
dashboard.”
• Astonishingly these teams now have their routine code in source
control.
BUT IT DIDN’T SOLVE EVERYTHING
• We had produced more tools, simplified tasks, but hadn’t really
created a culture of being a software producing organization.
• We had extended the laptop model … a little by introducing VMs that
could run the code.
And giving teams more tools had introduced some issues …
• A proliferation of models/predictions being run without curating the
processes.
• People leave, and their work continues to be run automatically …. This
is not always a bad thing, but it is often not a good thing either.
WHAT WE KNEW WE HAD TO DO NEXT
We needed to make a cultural change from what is essentially
“hacking” to engineering.
• So, we did start hiring people with more software engineering
skills.
• Introduced a style guide for our R code.
• We started code and project reviews.
• Hired a very non-technical writer to start helping the team
produce documentation on our internal Confluence site.
• Start providing training in team programming, engineering,
new languages (Spark, Python).
• Assign some of the positions on the team to be the
software/coding gurus.
WHAT’S NEXT
• Dev/Test/Prod environments.
• Upgrading our toolset to work with Rstudio Server and Git.
• Pair programming: a team member with software skills as
their primary background team programming with a data
scientist who has focused on statistical modeling and
machine learning.

Mais conteúdo relacionado

Mais procurados

Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeProduct School
 
What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
 
Online Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunOnline Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunDataiku
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data CircleDataiku
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesDomino Data Lab
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science teamAshish Bansal
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The FutureBecky Wang
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupKnowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupBenjamin Nussbaum
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...ux singapore
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseJake Bouma
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsInside Analysis
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6Zhihao Lin
 
Impactful SE Research: Some Do's and More Don'ts
Impactful SE Research: Some Do's and More Don'tsImpactful SE Research: Some Do's and More Don'ts
Impactful SE Research: Some Do's and More Don'tsGail Murphy
 
The Elusive Nature of Context: Why We Need It and Were We Might Find It
The Elusive Nature of Context: Why We Need It and Were We Might Find ItThe Elusive Nature of Context: Why We Need It and Were We Might Find It
The Elusive Nature of Context: Why We Need It and Were We Might Find ItGail Murphy
 

Mais procurados (20)

Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Exploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks LikeExploring What a Typical Data Science Project Looks Like
Exploring What a Typical Data Science Project Looks Like
 
What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...What's the Value of Data Science for Organizations: Tips for Invincibility in...
What's the Value of Data Science for Organizations: Tips for Invincibility in...
 
Online Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunOnline Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for Fun
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
How Your Data Can Predict The Future
How Your Data Can Predict The FutureHow Your Data Can Predict The Future
How Your Data Can Predict The Future
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupKnowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
 
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
Lightning Talk #12:7 cognitive biases we shouldn’t ignore in research by Ruth...
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Math in data
Math in dataMath in data
Math in data
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Deeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers AnalystsDeeper Questions: How Interactive Visualization Empowers Analysts
Deeper Questions: How Interactive Visualization Empowers Analysts
 
How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6
 
Impactful SE Research: Some Do's and More Don'ts
Impactful SE Research: Some Do's and More Don'tsImpactful SE Research: Some Do's and More Don'ts
Impactful SE Research: Some Do's and More Don'ts
 
The Elusive Nature of Context: Why We Need It and Were We Might Find It
The Elusive Nature of Context: Why We Need It and Were We Might Find ItThe Elusive Nature of Context: Why We Need It and Were We Might Find It
The Elusive Nature of Context: Why We Need It and Were We Might Find It
 

Destaque

ThinkFast: Scaling Machine Learning to Modern Demands
ThinkFast: Scaling Machine Learning to Modern DemandsThinkFast: Scaling Machine Learning to Modern Demands
ThinkFast: Scaling Machine Learning to Modern DemandsDomino Data Lab
 
Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Domino Data Lab
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's LawDomino Data Lab
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackDomino Data Lab
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaDomino Data Lab
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataDomino Data Lab
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnDomino Data Lab
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at NetflixDomino Data Lab
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User EngagementDomino Data Lab
 
5 Best Practices to Achieve Operational Excellence with Hive and MapReduce
5 Best Practices to Achieve Operational Excellence with Hive and MapReduce5 Best Practices to Achieve Operational Excellence with Hive and MapReduce
5 Best Practices to Achieve Operational Excellence with Hive and MapReduceDriven Inc.
 
Presentación sin título (1)
Presentación sin título (1)Presentación sin título (1)
Presentación sin título (1)Alex Hernandez
 
Beyond the Billable Hour: How to Leverage AFAs to Build a Better Practice
Beyond the Billable Hour: How to Leverage AFAs to Build a Better PracticeBeyond the Billable Hour: How to Leverage AFAs to Build a Better Practice
Beyond the Billable Hour: How to Leverage AFAs to Build a Better PracticeRocket Matter, LLC
 

Destaque (18)

ThinkFast: Scaling Machine Learning to Modern Demands
ThinkFast: Scaling Machine Learning to Modern DemandsThinkFast: Scaling Machine Learning to Modern Demands
ThinkFast: Scaling Machine Learning to Modern Demands
 
Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at Netflix
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
 
Proyecto de pequeña empresa.
Proyecto de pequeña empresa.Proyecto de pequeña empresa.
Proyecto de pequeña empresa.
 
Top 10 fashion
Top 10 fashionTop 10 fashion
Top 10 fashion
 
Aula 07 de estatística
Aula 07 de estatísticaAula 07 de estatística
Aula 07 de estatística
 
Gestión estratégica
Gestión estratégicaGestión estratégica
Gestión estratégica
 
5 Best Practices to Achieve Operational Excellence with Hive and MapReduce
5 Best Practices to Achieve Operational Excellence with Hive and MapReduce5 Best Practices to Achieve Operational Excellence with Hive and MapReduce
5 Best Practices to Achieve Operational Excellence with Hive and MapReduce
 
Presentación sin título (1)
Presentación sin título (1)Presentación sin título (1)
Presentación sin título (1)
 
Beyond the Billable Hour: How to Leverage AFAs to Build a Better Practice
Beyond the Billable Hour: How to Leverage AFAs to Build a Better PracticeBeyond the Billable Hour: How to Leverage AFAs to Build a Better Practice
Beyond the Billable Hour: How to Leverage AFAs to Build a Better Practice
 

Semelhante a Data Scientists as Software Engineers

Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
Software myths | Software Engineering Notes
Software myths | Software Engineering NotesSoftware myths | Software Engineering Notes
Software myths | Software Engineering NotesNavjyotsinh Jadeja
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
No Silver Bullet - Essence and Accidents of Software Engineering
No Silver Bullet - Essence and Accidents of Software EngineeringNo Silver Bullet - Essence and Accidents of Software Engineering
No Silver Bullet - Essence and Accidents of Software EngineeringAditi Abhang
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowTechWell
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceC4Media
 
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...Serena Software
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overviewalessio_ferrari
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptxSwarajPatel19
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Agile India
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019webwinkelvakdag
 
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!Gene Kim
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Aron Ahmadia
 
Infochimps: How We Do It
Infochimps: How We Do ItInfochimps: How We Do It
Infochimps: How We Do Ittemujin9
 

Semelhante a Data Scientists as Software Engineers (20)

Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
Software myths | Software Engineering Notes
Software myths | Software Engineering NotesSoftware myths | Software Engineering Notes
Software myths | Software Engineering Notes
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
No Silver Bullet - Essence and Accidents of Software Engineering
No Silver Bullet - Essence and Accidents of Software EngineeringNo Silver Bullet - Essence and Accidents of Software Engineering
No Silver Bullet - Essence and Accidents of Software Engineering
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea Flow
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and Science
 
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...
DevOps Frequently Asked Questions of 2013 with Gene Kim and Jonathan Thorpe (...
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
BTech Final Project (1).pptx
BTech Final Project (1).pptxBTech Final Project (1).pptx
BTech Final Project (1).pptx
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019
ORGANISING YOUR ADVANCED ANALYTICS PROJECTS FOR SUCCESS - Big Data Expo 2019
 
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!
2013 Velocity DevOps Metrics -- It's Not Just For WebOps Any More!
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013
 
Infochimps: How We Do It
Infochimps: How We Do ItInfochimps: How We Do It
Infochimps: How We Do It
 

Mais de Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataDomino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationDomino Data Lab
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryDomino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusDomino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceDomino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data ScientistsDomino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceDomino Data Lab
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the RescueDomino Data Lab
 

Mais de Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 
Fuzzy Matching to the Rescue
Fuzzy Matching to the RescueFuzzy Matching to the Rescue
Fuzzy Matching to the Rescue
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Data Scientists as Software Engineers

  • 1. DATA SCIENTISTS AND ANALYSTS ARE ALSO SOFTWARE ENGINEERS W.Whipple Neely Director of Data Science, EA
  • 2. THIS TALK IS ABOUT ….. Moving data science and analytics teams to a software development model. • The motivation is so that we can created repeatable, verifiable processes. • It also means that we can bring powerful but “personal” analysis environments (such as R) into producing enterprise level systems, to create work that typical dashboarding systems cannot achieve. • In many ways this is a story about one set of teams, it may not apply to all groups, but it has helped ours.
  • 3. THE TYPICAL VENN DIAGRAM: WHO IS A DATA SCIENTIST Statistics SomeVersion of Domain Expertise Computer Science “hacker skills” Data Science “What kind of person does all this? What abilities make a data scientist successful?Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser.” Davenport and Patil, Data Scientist: The Sexiest Job of the 21st Century , Harvard Business Review, 2012 “Hacker skills” is the wrong term
  • 4. Click to add call out GOOGLE IMAGE SEARCH: “WHO DATA SCIENTIST VENN DIAGRAM”
  • 5. WHAT WE DO INSTEAD OF WHO WE ARE Engineering CollaborationScience Data Science data engineering, coding discipline, software engineering, style guides reproducibility, source code control, regression tests math, stats, computer science, machine learning, probability models, economics, “substantive domain expertise”, vast quantities of common sense Rules of engagement, empathy, communication and listening skills, flexibility, reliability, extreme social skills
  • 6. THE PROBLEMS We have a team of data scientists who are experts at probability modeling, machine learning, and a few of them are pretty good at programming in R, Matlab or Python on a laptop. However … 1. Most have no experience of team programming 2. Many come without experience of creating software that others can use, or that is robust enough of to run 3. Creating an enterprise-level repeatable process can’t be left to the kind of programming that most of us do on our laptops 4. There is no easy intermediate step between working on a laptop and something that works on the enterprise platform.
  • 7. WHERE WE STARTED Write R or Python Script Run Script Manually Update Report Write R or Python Script Run Script Manually Update A Static Model Implementation OR
  • 8. THE PROBLEMS WITH WHERE WE STARTED • Code/methods/models got lost. • Lots of manual work. • No automated checks for correctness or robustness of models or predictions.
  • 9. WE TALKED TO THE TEAMS ABOUT WHAT WAS WRONG “Our analysts are pretty good at writing scripts and generating reports, but our team needs help with the bookends: scheduling tasks and serving the reports automatically” – Colleen Chrisco, Director of Analytics, PopCap Games
  • 10. IN TERMS OF OUR DIAGRAM Engineering CollaborationScience Data Science data engineering, coding discipline, software engineering, style guides reproducibility, source code control, regression tests math, stats, computer science, machine learning, probability models, economics, “substantive domain expertise”, vast quantities of common sense Rules of engagement, empathy, communication and listening skills, flexibility, reliability, extreme social skills
  • 11. Click to add call out THIS WAS A LITTLE SCARY FOR SOME OF OUR TEAMS …. We’re not programmers. I don’t even know where to start I’ve never scheduled a job before.
  • 12. Click to add call out SO, TO ANSWER THESE CONCERNS WE DID THE FOLLOWING… Perforce R Server Script Inputs: csv, DBs, URL, logs, RDS Script Outputs: csv, DBs, email, doc, pdf, html, shiny, RDS 1. Check in Code P4V, R-Checkin 2. Submit Job Schedule file, API, Web 3. Run Script Reporting, Models, ETLs, Forecasting R Script By “we did the following’ I really mean that we hired a brilliant computer scientist named Ben Weber who became part of the team. Ben learned the workflows of the team members and created this system for us.
  • 13. WHERE IT LANDED US • We’d automated. • We’d gotten the “bookends” covered. • Many analytics teams, including the data science team are using the system. As a result … • Teams started using the technology to improve their work • Teams became more efficient: “I no longer have to be a walking dashboard.” • Astonishingly these teams now have their routine code in source control.
  • 14. BUT IT DIDN’T SOLVE EVERYTHING • We had produced more tools, simplified tasks, but hadn’t really created a culture of being a software producing organization. • We had extended the laptop model … a little by introducing VMs that could run the code. And giving teams more tools had introduced some issues … • A proliferation of models/predictions being run without curating the processes. • People leave, and their work continues to be run automatically …. This is not always a bad thing, but it is often not a good thing either.
  • 15. WHAT WE KNEW WE HAD TO DO NEXT We needed to make a cultural change from what is essentially “hacking” to engineering. • So, we did start hiring people with more software engineering skills. • Introduced a style guide for our R code. • We started code and project reviews. • Hired a very non-technical writer to start helping the team produce documentation on our internal Confluence site. • Start providing training in team programming, engineering, new languages (Spark, Python). • Assign some of the positions on the team to be the software/coding gurus.
  • 16. WHAT’S NEXT • Dev/Test/Prod environments. • Upgrading our toolset to work with Rstudio Server and Git. • Pair programming: a team member with software skills as their primary background team programming with a data scientist who has focused on statistical modeling and machine learning.

Notas do Editor

  1. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  2. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  3. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  4. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century