Data scientists, data engineers, and data businesspeople are critical to leveraging data in any organization. A common complaint from data science managers is that data scientists invest time prototyping algorithms, and throw them over a proverbial fence to engineers to implement, only to find the algorithms must be rebuilt from scratch to scale. This is a symptom of a broader ailment -- that data teams are often designed as functional silos without proper communication and planning.
This talk outlines a framework to build and organize a data team that produces better results, minimizes wasted effort among team members, and ships great data products.
7. Which leads to job requirements like this…
• MSc/PhD in Computer Science, Electrical Engineering, Math or Statistics
• At least 5 years of experience in solving real-world practical problems using Machine Learning
• At least 5 years of experience on mining and modeling large-scale data (hundreds of terabytes)
• Extensive in-depth knowledge of Data Mining, Machine Learning, Algorithms
• Knowledge of at least one high-level programming language (C++, Java)
• Knowledge of at least one scripting language (Perl, Python, Ruby)
• Knowledge of SQL and experience with large relational databases
• Knowledge of at least one ML toolset (R, Weka, KNIME, Octave, Mahout, scikit-learn)
• Strong ability to formalize and provide practical solutions to research problems
• Strong communication skills and ability to work independently to get an idea from inception to
implementation.
• Knowledge of the state of the art in at least one of Bayesian Optimization, Recommendation
Systems, Social Network Analysis, Information Retrieval
• At least 5 years of experience with storing, sampling, querying large-scale data (hundreds of
terabytes) and experimentation frameworks
• At least 5 years of experience with Hadoop, Spark, Mahout or Giraph
13. Machine Learning,
Statistics, Domain Knowledge
Softw
are
Engineering
Business
Acum
en
Distributed
Com
puting
Com
m
unication
Look for T-shaped people
14. • Compose teams of individuals who
have overlapping skill-sets and
deep expertise in one area
(machine learning, statistics,
engineering, business, etc.)
• The overlap allows them to speak
the same language and work
collaboratively on solving problems
15. How do I structure my data science team within
my organization?
17. Centralized
Data Scientists sit on a team that
acts as internal consultants, fielding
and answering questions from
multiple teams within the
organization, defining tools for the
organization, and acting as highly
powered consultants.
18. Embedded
• Data Scientists are almost wholly
embedded within one particular team
and focus on solving problems for that
team.
• Teams are assigned to one particular
product or function within the company
and define and answer questions for
that product or function.
19. Hub & Spoke
• The data science team sits
together physically and works
collaboratively to solve problems.
• However, each data scientist (or
a combination of them) gets
deployed to work on problems
within the organization.
• Tends to apply to companies
who have a lot of users.
23. Data Science Software Engineering
Python R Java/C++ RoR/Javascript
modeling & prototyping production
24. Data scientists learn
to write prototypes
in production
languages
Engineers learn the
basics of data
science so they can
understand how
the models work
Goal is to have both teams speak
the same language and engender
trust through communication
25. Data Science Data Engineering
Common Core
Data Science
Curriculum
Data Engineering
Curriculum
Data Science Data Engineering
Projects
27. • Don’t look for unicorns, build collaborative
teams of T-shaped people
• Pay attention to how your data science team is
structured within your organization
• Get your data science and engineering teams to
speak the same language, allowing them to build
trust and work collaboratively
Summary
28. We believe an opportunity belongs
to anyone with aptitude and ambition.
29. 29Galvanize 2015
NODES ON THE NETWORK
COLORADO (BOULDER, DENVER, FORT COLLINS)
SEATTLE, WA
SAN FRANCISCO, CA
AUSTIN, TX (OPENING Q1 2016)
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Data
Engineering Immersive, Masters of Science in Data Science,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
[Explanation Text]
30. 30Galvanize 2015
PLACEMENT STATS
FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE
$43K $77KPre-program Salary
Average Starting Salary
97% Placement
Rate*
*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market.
This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.
$72K $114KPre-program Salary
94%Placement
Rate*
Average Starting Salary
31. 31Galvanize 2015
5 PROGRAMS
• Full Stack Immersive
• Data Science Immersive
• Data Engineering Immersive
Project over 500 Student Member Graduates in 2015
Currently over 1500 Members
• Master of Science in Data Science
(University of New Haven)
• Startup Membership
32. 32Galvanize 2015
FULL STACK IMMERSIVE
• 97% Placement Rate
within 6 months
• $77K Average Starting Salary
• 6 Month Program