SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Data Science Challenges
and Impact @Lazada
Big Data & Analytics Innovation Summit Singapore 2018
#1 Shopping
Site in SEA
145,000 sellers
3,000 brands
Lazada Data
Data App Devs expose, integrate, platform-ize
Data Scientists explore, prepare, model
Data Engineers collect, store, maintain
Start from bottom up
Considerations and
Challenges
How much business
input/overriding?
Trade-off: Manual human input vs. automated algorithms
Necessary to some extent, but harmful if overdone
Technically, manual input and rules are difficult to maintain
How much business
input/overriding?
Example: Manual override of product ranking on the site
Allows category managers to incorporate their domain
knowledge (e.g., new product releases, trending, etc.)
Nonetheless, too much manual overriding reduced metrics
Conducted AB tests to find optimal level of manual overriding
How fast is “too fast”?
Trade-off: Development speed vs. production stability
You can move faster without building tooling/abstractions, code
reviews, automated testing, repaying technical debt, documentation
But in the long run, they save time and effort
FB: “Move fast and break things” -> “Move fast with stable infra”
How fast is “too fast”?
Moar features!
Quick POC
Automation,
testing, tooling,
clear tech debt
Environment in place
Project size
Effort
Production
Dev SpeedStability
Less effort and faster =)
More effort and slower =(
Dev, dev, dev
Development effort over the long run
How fast is “too fast”?
Example: 8 man team, 10 problems—mostly focused on delivery
In the first two years, the team achieved a lot and proved our worth
Nonetheless, as we matured and had to maintain more production
code, investing in iteration speed and code quality had high ROI
How to set priorities with
business?
Trade-off: Short-term vs. long term
Business understands best what is needed, though can be overly
focused on day-to-day ops and near term goals
Data science is aware of the latest research and can innovate, but
risks being detached from business needs
How to set priorities with
business?
Example: Timebox-ed skunkworks resulting in POCs
Data leadership sponsored some POCs that were hacked together
in 2 – 4 weeks—some eventually made it into production
Nonetheless, the focus is on research and innovation that can be
applied to improve the online shopping experience
Development and
Impact
Automated Review QC
Product
Review
API
Spam
Classification
General
Classification
Model-based
Data sources
Rule-based
Keywords
Spam
Characteristics
Review
API
Manual QC
Input and post-processing
Audit
Overall results
Significant manpower cost savings (5-figures monthly)
Existing workforce can be diverted to difficult-to-automate tasks
Reduced lead-time before reviews are live on site
Product Ranking
Ranking
affects what
appears
on top
Ranking is
different
from recom-
mendation
Web Tracker
(JavaScript)
Mobile Tracker
(Adjust)
3rd Party
(e.g. ,ZenDesk,
SurveyGizmo)
Kafka Queues
Bulk Loaders
(Spark)
Hadoop
Hadoop
Data
Exploration
+
Data
Preparation
+
Feature
Engineering
+
Modelling
(Spark)
Manual
Boosting
(Django)
Local
Validation
A/B
Testing
Product
Seller
Transaction
Product rankings
Split traffic and measure outcomes
(Category Managers)
(User devices)
Overall results
Better ranking improved conversion (3 – 8%) and revenue per
session (5 – 20%)
Introducing new products improved new product engagement
(CTR increased 30 – 80%; add-to-cart increased 20 – 90%)
Emphasizing product quality had neutral to positive outcomes
(reduced return rate; increased product net promoter score)
Key takeaways
There is no single best answer to the challenges raised—it
depends on the maturity stage of the team and organization
Data science > Coding + Machine Learning—many other
activities contribute greatly to the final impact
Thank you!
eugene.yan@lazada.com
Our culture: http://bit.ly/datascienceculture
How we rank products: http://bit.ly/how-lazada-ranks-products

Mais conteúdo relacionado

Mais procurados

Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best PracticesKent Schnepp
 
Adobe analytics implementation secret hacks
Adobe analytics implementation secret hacksAdobe analytics implementation secret hacks
Adobe analytics implementation secret hacksAlban Gérôme
 
Elevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerActian Corporation
 
Introduction to Price Optimisation
Introduction to Price OptimisationIntroduction to Price Optimisation
Introduction to Price OptimisationAmmar Mohemmed
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템BOAZ Bigdata
 
Machine learning for Netflix recommendations talk at SF Make School
Machine learning for Netflix recommendations talk at SF Make SchoolMachine learning for Netflix recommendations talk at SF Make School
Machine learning for Netflix recommendations talk at SF Make SchoolFaisal Siddiqi
 
파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차Taekyung Han
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisDinesh V
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyAlexandros Karatzoglou
 
디지털마케팅스쿨6기 포트폴리오_김기윤
디지털마케팅스쿨6기 포트폴리오_김기윤디지털마케팅스쿨6기 포트폴리오_김기윤
디지털마케팅스쿨6기 포트폴리오_김기윤gi yoon kim
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-CommerceArul Bharathi
 
Technical SEO Presentation
Technical SEO PresentationTechnical SEO Presentation
Technical SEO PresentationJoe Robison
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Taekyung Han
 
Creating data apps using Streamlit in Python
Creating data apps using Streamlit in PythonCreating data apps using Streamlit in Python
Creating data apps using Streamlit in PythonNithish Raghunandanan
 

Mais procurados (20)

Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best Practices
 
Adobe analytics implementation secret hacks
Adobe analytics implementation secret hacksAdobe analytics implementation secret hacks
Adobe analytics implementation secret hacks
 
Elevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customerElevating customer analytics - how to gain a 720 degree view of your customer
Elevating customer analytics - how to gain a 720 degree view of your customer
 
Introduction to Price Optimisation
Introduction to Price OptimisationIntroduction to Price Optimisation
Introduction to Price Optimisation
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
Seo checklist
Seo checklistSeo checklist
Seo checklist
 
NLPP Method (English)
NLPP Method (English)NLPP Method (English)
NLPP Method (English)
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템
제10회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 밑바닥부터 시작하는 trivago 추천시스템
 
Machine learning for Netflix recommendations talk at SF Make School
Machine learning for Netflix recommendations talk at SF Make SchoolMachine learning for Netflix recommendations talk at SF Make School
Machine learning for Netflix recommendations talk at SF Make School
 
파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차파이썬을 활용한 챗봇 서비스 개발 3일차
파이썬을 활용한 챗봇 서비스 개발 3일차
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
디지털마케팅스쿨6기 포트폴리오_김기윤
디지털마케팅스쿨6기 포트폴리오_김기윤디지털마케팅스쿨6기 포트폴리오_김기윤
디지털마케팅스쿨6기 포트폴리오_김기윤
 
Applied Data Science for E-Commerce
Applied Data Science for E-CommerceApplied Data Science for E-Commerce
Applied Data Science for E-Commerce
 
Technical SEO Presentation
Technical SEO PresentationTechnical SEO Presentation
Technical SEO Presentation
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
 
Creating data apps using Streamlit in Python
Creating data apps using Streamlit in PythonCreating data apps using Streamlit in Python
Creating data apps using Streamlit in Python
 

Semelhante a Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...AgileNetwork
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Alan Kan
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Alan Kan
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthPrabhat Gupta
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product DevelopmentAaron Stannard
 
Future directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factorsFuture directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factorsVarun Luthra
 
Npi with bpm webinar
Npi with bpm webinarNpi with bpm webinar
Npi with bpm webinarAisurya Puhan
 
RAD Lab Overview v04
RAD Lab Overview v04RAD Lab Overview v04
RAD Lab Overview v04Daniel Grbac
 
Building Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACLBuilding Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACLJim Kaplan CIA CFE
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Triggr In
 
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentBest Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentMarc Nehme
 
Designing a to be process
Designing a to be processDesigning a to be process
Designing a to be processIhor Malytskyi
 
Gov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewGov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewSplunk
 
Improving Speed to Market in E-commerce
Improving Speed to Market in E-commerceImproving Speed to Market in E-commerce
Improving Speed to Market in E-commerceCognizant
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsSanjeev Sharma
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys HolovatyiDataScienceConferenc1
 

Semelhante a Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018) (20)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
 
Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...Business and IT alignment through effective Project & Program Portfolio Manag...
Business and IT alignment through effective Project & Program Portfolio Manag...
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product Development
 
Future directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factorsFuture directives in erp, erp and internet, critical success and failure factors
Future directives in erp, erp and internet, critical success and failure factors
 
Npi with bpm webinar
Npi with bpm webinarNpi with bpm webinar
Npi with bpm webinar
 
RAD Lab Overview v04
RAD Lab Overview v04RAD Lab Overview v04
RAD Lab Overview v04
 
Building Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACLBuilding Simple Continuous Reviews in ACL
Building Simple Continuous Reviews in ACL
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentBest Practices and Lessons Learned on Our IBM Rational Insight Deployment
Best Practices and Lessons Learned on Our IBM Rational Insight Deployment
 
CIS 499 Final
CIS 499 FinalCIS 499 Final
CIS 499 Final
 
Designing a to be process
Designing a to be processDesigning a to be process
Designing a to be process
 
Gov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/OverviewGov Day Sacramento 2015 - Keynote/Overview
Gov Day Sacramento 2015 - Keynote/Overview
 
Improving Speed to Market in E-commerce
Improving Speed to Market in E-commerceImproving Speed to Market in E-commerce
Improving Speed to Market in E-commerce
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 

Mais de Eugene Yan Ziyou

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and searchEugene Yan Ziyou
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixEugene Yan Ziyou
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionEugene Yan Ziyou
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsEugene Yan Ziyou
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceEugene Yan Ziyou
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data ScienceEugene Yan Ziyou
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Eugene Yan Ziyou
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)Eugene Yan Ziyou
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Eugene Yan Ziyou
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveEugene Yan Ziyou
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communityEugene Yan Ziyou
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Eugene Yan Ziyou
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsEugene Yan Ziyou
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and DistributionEugene Yan Ziyou
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USEugene Yan Ziyou
 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsEugene Yan Ziyou
 

Mais de Eugene Yan Ziyou (19)

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrix
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brands
 

Último

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Último (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

  • 1. Data Science Challenges and Impact @Lazada Big Data & Analytics Innovation Summit Singapore 2018
  • 2. #1 Shopping Site in SEA 145,000 sellers 3,000 brands
  • 3. Lazada Data Data App Devs expose, integrate, platform-ize Data Scientists explore, prepare, model Data Engineers collect, store, maintain Start from bottom up
  • 5. How much business input/overriding? Trade-off: Manual human input vs. automated algorithms Necessary to some extent, but harmful if overdone Technically, manual input and rules are difficult to maintain
  • 6. How much business input/overriding? Example: Manual override of product ranking on the site Allows category managers to incorporate their domain knowledge (e.g., new product releases, trending, etc.) Nonetheless, too much manual overriding reduced metrics Conducted AB tests to find optimal level of manual overriding
  • 7. How fast is “too fast”? Trade-off: Development speed vs. production stability You can move faster without building tooling/abstractions, code reviews, automated testing, repaying technical debt, documentation But in the long run, they save time and effort FB: “Move fast and break things” -> “Move fast with stable infra”
  • 8. How fast is “too fast”? Moar features! Quick POC Automation, testing, tooling, clear tech debt Environment in place Project size Effort Production Dev SpeedStability Less effort and faster =) More effort and slower =( Dev, dev, dev Development effort over the long run
  • 9. How fast is “too fast”? Example: 8 man team, 10 problems—mostly focused on delivery In the first two years, the team achieved a lot and proved our worth Nonetheless, as we matured and had to maintain more production code, investing in iteration speed and code quality had high ROI
  • 10. How to set priorities with business? Trade-off: Short-term vs. long term Business understands best what is needed, though can be overly focused on day-to-day ops and near term goals Data science is aware of the latest research and can innovate, but risks being detached from business needs
  • 11. How to set priorities with business? Example: Timebox-ed skunkworks resulting in POCs Data leadership sponsored some POCs that were hacked together in 2 – 4 weeks—some eventually made it into production Nonetheless, the focus is on research and innovation that can be applied to improve the online shopping experience
  • 15. Overall results Significant manpower cost savings (5-figures monthly) Existing workforce can be diverted to difficult-to-automate tasks Reduced lead-time before reviews are live on site
  • 19. Web Tracker (JavaScript) Mobile Tracker (Adjust) 3rd Party (e.g. ,ZenDesk, SurveyGizmo) Kafka Queues Bulk Loaders (Spark) Hadoop Hadoop Data Exploration + Data Preparation + Feature Engineering + Modelling (Spark) Manual Boosting (Django) Local Validation A/B Testing Product Seller Transaction Product rankings Split traffic and measure outcomes (Category Managers) (User devices)
  • 20. Overall results Better ranking improved conversion (3 – 8%) and revenue per session (5 – 20%) Introducing new products improved new product engagement (CTR increased 30 – 80%; add-to-cart increased 20 – 90%) Emphasizing product quality had neutral to positive outcomes (reduced return rate; increased product net promoter score)
  • 21. Key takeaways There is no single best answer to the challenges raised—it depends on the maturity stage of the team and organization Data science > Coding + Machine Learning—many other activities contribute greatly to the final impact
  • 22. Thank you! eugene.yan@lazada.com Our culture: http://bit.ly/datascienceculture How we rank products: http://bit.ly/how-lazada-ranks-products