This document provides an overview of data science and how to get started in the field. It defines a data scientist as a jack-of-all-trades who frames questions, collects and processes data, explores patterns in the data, and communicates findings. The field has emerged due to big data and a need to make sense of large datasets. Learning data science involves skills like SQL, data visualization, and machine learning algorithms. Thinkful offers 1-on-1 mentorship and career preparation programs to help people transition into data science careers.
9. Example: LinkedIn 2006
“[LinkedIn] was like arriving at a conference
reception and realizing you don’t know
anyone. So you just stand in the corner
sipping your drink—and you probably leave
early.”
-LinkedIn Manager, June 2006
10. Enter: Data Scientist
Joined LinkedIn in 2006, only 8M
users (450M in 2016)
Started experiments to predict
people’s networks
Engineers were dismissive: “you
can already import your address
book”
Jonathan Goldman
12. Other Examples
Uber — Where drivers should hang out
Netflix — $1M prize for better
recommendations
Tala — Microfinance loan approval
13. Why now?
Big Data: datasets whose size is beyond the
ability of typical database software tools to
capture, store, manage, and analyze
14. Brief history of ‘big data’
Trend “started” in 2005
Web 2.0 - Majority of content is created by
users
Mobile accelerates this — data/person
skyrockets
15. Big Data
90% of the data in the world today has been
created in the last two years alone
- IBM, May 2013
19. Data science is just the beginning
“The United States alone faces a shortage of
140,000 to 190,000 people with deep analytical
skills as well as 1.5 million managers and
analysts to analyze big data and make
decisions based on their findings.”
- McKinsey
20. The Process - LinkedIn Example
Frame the question
Collect the raw data
Process the data
Explore the data
Communicate results
21. Case: Frame the Question
What questions do we want to answer?
22. Case: Frame the Question
What connections (type and number) lead to
higher user engagement?
Which connections do people want to make
but are currently limited from making?
How might we predict these types of
connections with limited data from the user?
23. Case: Collect the Data
What data do we need to answer these
questions?
24. Case: Collect the Data
Connection data (who is who connected to?)
Demographic data (what is the profile of the
connection)
Engagement data (how do they use the site)
25. Case: Process the Data
How is the data “dirty” and how can we clean
it?
26. Case: Process the Data
User input
Redundancies
Feature changes
Data model changes
27. Case: Explore the Data
What are the meaningful patterns in the
data?
28. Case: Explore the Data
Triangle closing
Time overlaps
Geographic overlaps
36. #3: Machine Learning Algorithms
Machine learning algorithms provide computers
with the ability to learn without being explicitly
programmed — “programming by example”
42. But if you’re interested…
Knowledge of statistics, algorithms, &
software
Comfort with languages & tools (Python,
SQL, Tableau)
Inquisitiveness and intellectual curiosity
Strong communication skills
It’s all Teachable!
47. Try us out!
• Initial 3-week prep course
includes six mentor sessions
for $250
• Learn Python, Python’s data
science toolkit, Statistics intro
• Option to continue onto Data
Science bootcamp
• Talk to me (or email
jas@thinkful.com) if you’re
interested