O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Sharing about my data science journey and what I do at Lazada

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 62 Anúncio

Sharing about my data science journey and what I do at Lazada

Baixar para ler offline

Was invited to share with the SMU Masters of IT in Business students on (i) how I got to my current position as a data scientist and (ii) what I do in my current position.

Includes suggested areas to focus on (e.g., distributed systems and processing) and how to gain more experience (e.g., volunteering). I also go through the problems that we solve at Lazada using machine learning and a high level architecture of how we do it.

Was invited to share with the SMU Masters of IT in Business students on (i) how I got to my current position as a data scientist and (ii) what I do in my current position.

Includes suggested areas to focus on (e.g., distributed systems and processing) and how to gain more experience (e.g., volunteering). I also go through the problems that we solve at Lazada using machine learning and a high level architecture of how we do it.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Sharing about my data science journey and what I do at Lazada (20)

Mais de Eugene Yan Ziyou (9)

Anúncio

Mais recentes (20)

Sharing about my data science journey and what I do at Lazada

  1. 1. Hi, I’m Eugene I’m here to share about my data science journey and what I do at Lazada 4th April 2016 SMU Masters of IT in Business
  2. 2. Before I begin, any questions you would like addressed? I’ll answer throughout my sharing.
  3. 3. An introduction about myself
  4. 4. Studied Psychology and Business at Singapore Management University (SMU); wanted to use data to create positive impact
  5. 5. Did economic and political analysis at Ministry of Trade & Industry (MTI)
  6. 6. Joined IBM to pursue passion in working with data
  7. 7. First step into data science as a data analyst, where I…
  8. 8. Developed dashboards and analytics for end-to-end supply chain optimization
  9. 9. Worked on an anti-money laundering and entity resolution system for a global bank
  10. 10. Collected and analyzed tweets to provide insight on tweet share and sentiment for electronics conglomerate
  11. 11. Then, was transferred to workforce analytics team, working on data from IBM’s 450k employees to build…
  12. 12. Forecast models for global job demand to optimize recruitment and workforce allocation
  13. 13. Job recommendation engine to increase internal transfers, skill renewal, satisfaction, and reduce attrition
  14. 14. Currently at Lazada’s Data Science team; more later
  15. 15. My data science journey
  16. 16. Skill sets needed to be a data analyst and how I acquired them
  17. 17. Probability, statistics and experimental design from education in Psychology
  18. 18. Technical skills in SPSS Statistics and R from undergraduate education in Psychology
  19. 19. Written and verbal communication from essays and presentations (SMU), and briefs and stakeholder engagement with industry leaders (MTI)
  20. 20. Teamwork from projects in SMU and MTI
  21. 21. Skill sets needed to be a data scientist and how I acquired them - Statistics - Experimental Design - SPSS & R - Communication - Teamwork
  22. 22. More R via MOOCs: - Data Analysis and statistical inference (Duke) - Computing for Data Analysis (Johns Hopkins)
  23. 23. Python via MOOCs: - Computer Science and Programming in Python (MIT) - Interactive programming in Python (Rice)
  24. 24. SQL via any site with in-browser query engine
  25. 25. Machine Learning via MOOCs: - Machine Learning (Stanford) - Statistical Learning (Stanford) - Social and Economic Networks (Stanford) - Text Mining and Analytics (Urbana-Champaign)
  26. 26. Distributed storage and processing via MOOCs: - Mining Massive Datasets (Stanford) - Big data with Apache Spark (UC Berkeley) - Scalable Machine Learning with Apache Spark (UC Berkeley)
  27. 27. Learning alone is insufficient; I also had to practice (a lot)
  28. 28. Volunteer for things people don’t want to do - Volunteered for project on Twitter tracking with $0 budget
  29. 29. Twitter project: Connect to API, download tweets 24/7 over 2 weeks, analyze tweets; learnt how to: - Work with APIs - Recover from failure automatically - Work with data that can’t fit in memory - Text analytics and sentiment analysis
  30. 30. Volunteer with DataKind SG and helping NGOs tackle problems through data science
  31. 31. Volunteer to facilitate Johns Hopkins Data Science Specialization (Statistical Inference)
  32. 32. Kaggle meaningfully on competitions with real- world applications; competitions I’ve tried include…
  33. 33. Otto Production Classification: Classify products into 9 main product categories
  34. 34. Springleaf Marketing Response: Predict if customers will respond to direct mail
  35. 35. Telstra Network Disruptions: Predict severity of service disruption
  36. 36. Skill sets to be a better data scientist (what I’m focusing on now) - Statistics - Experimental Design - SPSS & R - Communication - Teamwork - Python - SQL - Machine Learning - Distribute Storage & Processing
  37. 37. Finding problems and opportunities people overlook
  38. 38. Proper software engineering
  39. 39. Designing and building data products end-to-end
  40. 40. Building data products using Spark (Scala)
  41. 41. My journey so far… - Statistics - Experimental Design - SPSS & R - Communication - Teamwork - Python - SQL - Machine Learning - Distribute Storage & Processing - Finding use cases - Software Engineering - Designing data products - Spark & Scala
  42. 42. So what can you do? - Get very good at basic SQL - Get very good at either R or Python - Understand basic machine learning techniques - Understand distributed systems and processing - Improve communication by writing and sharing - Get experience by doing projects on machine learning and distributed processing (e.g., Open data, Volunteering, Kaggle, etc)
  43. 43. What I do at Lazada
  44. 44. Lazada Data Science: Data Engineers, Scientists, Tool Developers
  45. 45. A rough guide to each role Collect, store, maintainEngineers Explore, prepare, modelScientists Expose, integrate, platform-ize Tool Developers Lines may blur between roles
  46. 46. Problems we work on…
  47. 47. Product-related: - Product Categorization - Attribute Extraction - Spam Detection - Image Quality Checking
  48. 48. Consumer-related: - Recommendations - Product Ranking - Consumer Segmentation - Customer Lifetime Value
  49. 49. Seller-related: - Price Elasticity - Detecting Counterfeits
  50. 50. Operation-related: - Delivery time forecasting
  51. 51. What I’m working on
  52. 52. Product categorization Product title & description Machine Learning Categorization Rules-based Categorization Crowd Categorization Product Category Quality Checking and Validation Sufficient confidence If insufficient confidence API for self-service Production Scheduled batch jobs Product Category
  53. 53. Product Ranking for onsite display Product Data Purchase Data Behavioral Data (e.g., clickstream) Other Data (e.g., ratings, etc) Merging datasets Feature Engineering Model product rankings Data Cleaning Rule-based modifiers Measurement & A/B Testing
  54. 54. Recommendations for newsletter subscribers Product Data Purchase Data Behavioral Data (e.g., clickstream) Other Data (e.g., ratings, etc) Merging datasets Feature Engineering Data Cleaning Customer Segmentation Forecasted Top Sellers Recommendations Newsletter Creation Measurement & A/B Testing Rule-based modifiers
  55. 55. How is my time spent
  56. 56. Data Preparation, 50% Modeling, 20% Productionizing, 30% Coding Breakdown Majority of time spent coding (thankfully) Coding, 55% Engagment, 30% Others, 15%
  57. 57. Data Preparation - Merging data - Imputing nulls - Removing duplicates - Handling outliers - Fixing formats - Etc, etc, etc
  58. 58. Building the model - Feature engineering - Machine learning - Validation - Iterate, iterate, iterate
  59. 59. Deploying to production - Proof-of-concept - Developing API - Scheduling jobs - Continuous integration - Fixing bugs
  60. 60. Engagement (with stakeholders) - Roadmap planning (quarterly) - Aligning solution with problem - Explaining and getting buy-in
  61. 61. Other tasks - Providing assistance - Research and brainstorming - Team sharing
  62. 62. Any further questions? eugeneyanziyou@gmail.com eugene.yan@lazada.com

×