Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams

Bridging the Gap Between Data Science
& Engineering:
Building High-Performing Teams

How do I hire a data scientist?

Software Engineer Data Engineer Data Scientist Applied Scientist Research Scientist
Continuum of Skills

Math &
Stats
Computer
Science
Domain
Expertise
Machine
Learning
Software
Engineering Research
Unicorn
Data Science

Many companies try to ﬁnd all of these skills in a
single person.

Which leads to job requirements like this…
• MSc/PhD in Computer Science, Electrical Engineering, Math or Statistics
• At least 5 years of experience in solving real-world practical problems using Machine Learning
• At least 5 years of experience on mining and modeling large-scale data (hundreds of terabytes)
• Extensive in-depth knowledge of Data Mining, Machine Learning, Algorithms
• Knowledge of at least one high-level programming language (C++, Java)
• Knowledge of at least one scripting language (Perl, Python, Ruby)
• Knowledge of SQL and experience with large relational databases
• Knowledge of at least one ML toolset (R, Weka, KNIME, Octave, Mahout, scikit-learn)
• Strong ability to formalize and provide practical solutions to research problems
• Strong communication skills and ability to work independently to get an idea from inception to
implementation.
• Knowledge of the state of the art in at least one of Bayesian Optimization, Recommendation
Systems, Social Network Analysis, Information Retrieval
• At least 5 years of experience with storing, sampling, querying large-scale data (hundreds of
terabytes) and experimentation frameworks
• At least 5 years of experience with Hadoop, Spark, Mahout or Giraph

These people do exist, but they are often already
well-compensated, and only want to work on
interesting problems.

What can you do?
Build a team instead.

Broad-range generalist
Deepexpertise
Look for T-shaped people

Machine Learning,
Statistics, Domain Knowledge
Softw
are
Engineering
Business
Acum
en
Distributed
Com
puting
Com
m
unication
Look for T-shaped people

• Compose teams of individuals who
have overlapping skill-sets and
deep expertise in one area
(machine learning, statistics,
engineering, business, etc.)
• The overlap allows them to speak
the same language and work
collaboratively on solving problems

How do I structure my data science team within
my organization?

Data Science Team Structures
CentralizedEmbeddedHub & Spoke

Centralized
Data Scientists sit on a team that
acts as internal consultants, ﬁelding
and answering questions from
multiple teams within the
organization, deﬁning tools for the
organization, and acting as highly
powered consultants.

Embedded
• Data Scientists are almost wholly
embedded within one particular team
and focus on solving problems for that
team.
• Teams are assigned to one particular
product or function within the company
and deﬁne and answer questions for
that product or function.

Hub & Spoke
• The data science team sits
together physically and works
collaboratively to solve problems.
• However, each data scientist (or
a combination of them) gets
deployed to work on problems
within the organization.
• Tends to apply to companies
who have a lot of users.

Data Science Team Structure
CentralizedEmbeddedHub & Spoke
> >

How do I get my data scientists to work with
engineering?

Data Science
Python R
modeling & prototyping production
Software Engineering
Java/C++ RoR/Javascript

Data Science Software Engineering
Python R Java/C++ RoR/Javascript
modeling & prototyping production

Data scientists learn
to write prototypes
in production
languages
Engineers learn the
basics of data
science so they can
understand how
the models work
Goal is to have both teams speak
the same language and engender
trust through communication

Data Science Data Engineering
Common Core
Data Science
Curriculum
Data Engineering
Curriculum
Data Science Data Engineering
Projects

Data Science Engineering
Initial Planning
Production

• Don’t look for unicorns, build collaborative
teams of T-shaped people
• Pay attention to how your data science team is
structured within your organization
• Get your data science and engineering teams to
speak the same language, allowing them to build
trust and work collaboratively
Summary

We believe an opportunity belongs  
to anyone with aptitude and ambition.

29Galvanize 2015
NODES ON THE NETWORK
COLORADO (BOULDER, DENVER, FORT COLLINS)
SEATTLE, WA
SAN FRANCISCO, CA
AUSTIN, TX (OPENING Q1 2016)
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Data
Engineering Immersive, Masters of Science in Data Science,
Entrepreneurship
Entrepreneurship
[Explanation Text]

30Galvanize 2015
PLACEMENT STATS
FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE
$43K $77KPre-program Salary
Average Starting Salary
97% Placement
Rate*
*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market.
This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.
$72K $114KPre-program Salary
94%Placement
Rate*
Average Starting Salary

31Galvanize 2015
5 PROGRAMS
• Full Stack Immersive
• Data Science Immersive
• Data Engineering Immersive
Project over 500 Student Member Graduates in 2015
Currently over 1500 Members
• Master of Science in Data Science  
(University of New Haven)
• Startup Membership

32Galvanize 2015
FULL STACK IMMERSIVE
• 97% Placement Rate  
within 6 months
• $77K Average Starting Salary
• 6 Month Program

33Galvanize 2015
FULL STACK IMMERSIVE

34Galvanize 2015
DATA SCIENCE IMMERSIVE
• 94% Placement Rate  
within 6 months
• $114K Average Starting Salary
• 3 Month Program

35Galvanize 2015
Week 1 - Exploratory Data Analysis and Software Engineering Best Practices
Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
Week 3 - Regression, Regularization, Gradient Descent
Week 4 - Supervised Machine Learning: Classiﬁcation, Validation, Ensemble Methods
Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP
Week 6 - Network Analysis, Matrix Factorization, and Time Series
Week 7 - Hadoop, Hive, and MapReduce
Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study
Weeks 9-10 - Capstone Projects
Week 12 - Onsite Interviews

36Galvanize 2015

37Galvanize 2015
DATA ENGINEERING IMMERSIVE
• Launched Oct. 2015
• Built in partnership with Nvent and
Concurrent
• 3 Month Program

THANK YOU
RYAN ORBAN | EVP OF PRODUCT & STRATEGY
ryan.orban@galvanize.com
@ryanorban
www.galvanize.com

Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams

Semelhante a Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams (20)

Último

Último (20)

Bridging the Gap Between Data Science & Engineer: Building High-Performance Teams