O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Getting Started in Data Science

Carregando em…3

Confira estes a seguir

1 de 43 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Getting Started in Data Science (20)


Mais de Thinkful (20)

Mais recentes (20)


Getting Started in Data Science

  1. 1. Data Science: How did we get here and where are we going? June 2017 http://bit.ly/data-la WIFI: CrossCamp.us Events
  2. 2. About us We train developers and data scientists through 1-on-1 mentorship and career prep
  3. 3. About us • Noel Duarte • Los Angeles Area General Manager • UC Berkeley ’15 — worked primarily with R for population genetics analysis, at Thinkful since January 2016 • Kyle Polich • Data science mentor at Thinkful • Host for Data Skeptic, a podcast devoted to all things data science and advancements in the industry
  4. 4. About you Why are you here? • I already have a career in data • I’m curious about switching to a career in data • I want to learn what data science is and why it’s important
  5. 5. Today’s goals • Why is data science important? • What is a data scientist and what do they do? • How and why has the field emerged? • How can one become a data scientist? (And why would you want to?)
  6. 6. Why is data science important? By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. - McKinsey Global Institute (MGI)
  7. 7. Data Scientist:
  8. 8. Case study: LinkedIn (2006) “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  9. 9. The new guy • Joined LinkedIn in 2006, only 8M users (450M in 2016) • Started experiments to predict people’s networks • Engineers were dismissive: “you can already import your address book”
  10. 10. The result
  11. 11. Data, data everywhere 🚀 • Uber — Where drivers should hang out • Netflix — movie recommendations • Ebola epidemic — Mobile mapping in Senegal to fight disease
  12. 12. Data, data everywhere 🚀
  13. 13. Big Data — what exactly does it mean? Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  14. 14. Big Data — brief history • Trend “started” in 2005 (Hadoop!) • Web 2.0 - Majority of content is created by users • Mobile accelerates this — data/person skyrockets
  15. 15. Big Data — 3 Vs
  16. 16. Big Data — tldr; 90% of the data in the world today has been created in the last two years alone. - IBM, May 2013
  17. 17. In come data scientists!
  18. 18. Intersection of engineering, statistics, & communication
  19. 19. The data science process Let’s come back to LinkedIn’s evolution in 2006 and examine it using a typical* data science approach. • Frame the question • Collect the raw data • Process the data • Explore the data • Communicate results
  20. 20. Case: Frame the question What questions do we want to answer?
  21. 21. Case: Frame the question • What connections (type and number) lead to higher user engagement? • Which connections do people want to make but are currently limited from making? • How might we predict these types of connections with limited data from the user?
  22. 22. Case: Collect the data What data do we need to answer these questions?
  23. 23. Case: Collect the data • Connection data (who is who connected to?) • Demographic data (what is profile of connection?) • Retention data (how do people stay or leave?) • Engagement data (how do they use the site?)
  24. 24. Case: Process the data How is the data “dirty” and how can we clean it?
  25. 25. Case: Process the data • User input • Redundancies • Feature changes • Data model changes
  26. 26. Case: Explore the data What are the meaningful patterns in the data?
  27. 27. Case: Explore the data • Triangle closing • Time overlaps • Geographic clustering
  28. 28. Case: Communicate results How do we communicate this? To whom?
  29. 29. Case: Communicate results • Tell story at the right technical level for each audience • Make sure to focus on Whats In It For You (WIIFY!) • Be objective, don’t lie with statistics • Be visual! Show, don’t just tell
  30. 30. Tools to explore “big data” • SQL Queries • Business Analytics Software • Machine Learning Algorithms
  31. 31. Tool #1: SQL queries SQL is the standard querying language to access and manipulate databases
  32. 32. SQL example friends id full_name age 1 Dan Friedman 24 2 Jared Jones 27 3 Paul Gu 22 4 Noel Duarte 73 SELECT full_name FROM friends WHERE age=73
  33. 33. Tool #2: Analytics software Business analytics software for your database enabling you to easily find and communicate insights visually
  34. 34. Tableau example
  35. 35. Tool #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  36. 36. Iris data set example
  37. 37. Iris data set example
  38. 38. Use cases for machine learning • Classification — Predict categories • Regression — Predict values • Anomaly Detection — Find unusual occurrences • Clustering — Discover structure
  39. 39. I’m in! Where do I start? • Knowledge of statistics, algorithms, & software • Comfort with languages & tools (Python, SQL, Tableau) • Inquisitiveness and intellectual curiosity • Strong communication skills
  40. 40. Ways to keep learning More Structure Less Structure Less Support More Support
  41. 41. 1-on-1 mentorship enables flexibility 325+ mentors with an average of 10 years of experience in the field
  42. 42. Support ‘round the clock You Your mentor Q&A Sessions In-person workshops Career coachSlack Program Manager
  43. 43. Want to try us/data science out? Talk to us now or be on the look out for our email 📬 Thinkful’s Data Science Prep Course covers: - Python fundamentals - Statistics - Data science concepts - Capstone project $250 for 3 weeks