O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Getting started in data science (4:3)

Carregando em…3

Confira estes a seguir

1 de 51 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Getting started in data science (4:3) (20)


Mais de Thinkful (20)

Mais recentes (20)


Getting started in data science (4:3)

  1. 1. Getting Started in Data Science April 2017 http://bit.ly/tf-data-science
  2. 2. About me • Jasjit Singh • Worked in finance & tech • Co-Founder Hotspot • Thinkful General Manager
  3. 3. About us Thinkful prepares students for web development & data science jobs with 1-on-1 mentorship programs
  4. 4. About you •I already have a career in data •I’m serious about switching into a career in data •I’m curious about switching into a career in data •Ugh I just want to see what all the fuss is about •Data is my favorite character in Star Trek
  5. 5. Today’s goals •What is a data scientist and what do they do? •How and why has the field emerged? •How can one become a data scientist?
  6. 6. Agenda for tonight • What is the market landscape for dev jobs? • What programming language should I learn? • What are the best ways to learn to code? • What are the first jobs / trajectories? • How do I break into the field?
  7. 7. Why do we care? “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” - McKinsey
  8. 8. Which means… …average salaries are $115,000 a year
  9. 9. Definition #1
  10. 10. Definition #2 Nate Silver FiveThirtyEight.com “I think data-scientist is a sexed up term for a statistician”
  11. 11. My favorite definition
  12. 12. Case study: LinkedIn (2006) “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  13. 13. The new guy Jonathan Goldman •Joined LinkedIn in 2006, only 8M users (450M in 2016) •Started experiments to predict people’s networks •Engineers were dismissive: “you can already import your address book”
  14. 14. The result
  15. 15. Other examples •Uber — Where drivers should hang out •Netflix — $1M movie recommendations contest •Ebola — Mobile mapping in Senegal to fight disease
  16. 16. “Big Data” changed the game Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  17. 17. Brief history of “Big Data” •Trend “started” in 2005 (Hadoop!) •Web 2.0 - Majority of content is created by users •Mobile accelerates this — data/person skyrockets
  18. 18. Explosion across 3V’s
  19. 19. Big data: tldr; 90% of the data in the world today has been created in the last two years alone - IBM, May 2013
  20. 20. We’re drowning in data
  21. 21. Data scientists are the solution
  22. 22. A jack of all trades
  23. 23. Data science process •Frame the question •Collect the raw data •Process the data •Explore the data •Communicate results
  24. 24. Frame the question What questions do we want to answer?
  25. 25. Frame the question •What connections (type and number) lead to higher user engagement? •Which connections do people want to make but are currently limited from making? •How might we predict these types of connections with limited data from the user?
  26. 26. Collect the data What data do we need to answer these questions?
  27. 27. Collect the data •Connection data (who is who connected to?) •Demographic data (what is profile of connection) •Retention data (how do people stay or leave) •Engagement data (how do they use the site)
  28. 28. Process the data How is the data “dirty” and how can we clean it?
  29. 29. Process the data •User input •Redundancies •Feature changes •Data model changes
  30. 30. Explore the data What are the meaningful patterns in the data?
  31. 31. Explore the data •Triangle closing •Time overlaps •Geographic clustering
  32. 32. Communicating the findings How do we communicate this? To whom?
  33. 33. Communicating the findings •Tell story at the right technical level for each audience •Make sure to focus on Whats In It For You (WIIFY!) •Be objective, don’t lie with statistics •Be visual! Show, don’t just tell
  34. 34. Tools to explore “big data” •SQL Queries •Business Analytics Software •Machine Learning Algorithms
  35. 35. Tool #1: SQL queries SQL is the standard querying language to access and manipulate databases
  36. 36. SQL example friends id full_name age 1 Dan Friedman 24 2 Jared Jones 27 3 Paul Gu 22 4 Jasjit Singh 73 SELECT full_name FROM friends WHERE age=73
  37. 37. Tool #2: Analytics software Business analytics software for your database enabling you to easily find and communicate insights visually
  38. 38. Analytics software example
  39. 39. Tool #3: Machine learning algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  40. 40. Iris data set
  41. 41. Iris data set
  42. 42. Use cases for machine learning •Classification — Predict categories •Regression — Predict values •Anomaly Detection — Find unusual occurrences •Clustering — Discover structure
  43. 43. If this excites you…
  44. 44. This is what you’ll need •Knowledge of statistics, algorithms, & software •Comfort with languages & tools (Python, SQL, Tableau) •Inquisitiveness and intellectual curiosity •Strong communication skills
  45. 45. Data science bootcamp Syllabus: Python Toolkit, Statistics & Probability, Experimentation, Machine Learning, Communicating Data, Algorithms and Big Data
  46. 46. More about Thinkful • Anyone who’s committed can learn to code • 1-on-1 mentorship is the best way to learn • Flexibility matters — learn anywhere, anytime • We only make money when you get a job
  47. 47. Our Program You’ll learn concepts, practice with drills, and build capstone projects for your own portfolio — all guided by a personal mentor
  48. 48. Our Mentors Mentors have, on average, 10+ years of experience
  49. 49. Our Results Job Titles after GraduationMonths until Employed
  50. 50. Special Prep Course Offer • Three-week program, includes six mentor sessions for $250 • Overview of Python, Python’s data science toolkit, stats • Option to continue into full data science bootcamp • Talk to me (or email me) if you’re interested
  51. 51. October 2015 Questions? jas@thinkful.com schedule a call through thinkful.com