Lessons learned by making data science products at Tokopedia. This includes a recommendation engine, marketing automation and challenges with data science and AI products.
2. WHO AM I Paytm QR Code
Paytm PassBook
Paytm Merchant SDK
Paytm Payments Bank
Paytm Offline Pay etc.
I started working as PM straight out of college with Paytm.
I was focussed on making Paytm Consumer Products for next 2
years in which I launched -
Referral and it's Fraud Detection
User Clustering and Product Clustering
Marketing Optimization
NLP and Image Cassification for Product Catalogue
Then I worked with Tokopedia in Indonesia as Growth and Data
Product Lead. Products -
Currently, I am working with Branch Metrics, which
is based in SF and responsible for their Core
Platform and Deep Linking product.
3. SOME MORE INFO
Top Indonesian Everyday App with Valuation of $7B
Founded: 6 February 2009
Tokopedia has raised a total of $2.4B in funding over 9 rounds
Top App in Indonesia
Interested in Data Science and Got my first job accidentaly
because of it.
Courses - Data Camp, AI Product Manager Udacity, Machine
Learning and Deeplearning courses online.
Spend my time reading Life 3.0, AI Superpowers, Fourth Age
etc.
Tokopedia -
Me -
4. WHAT IS AI-ML-DL
Image Source - Nividia
Artificial Intelligence — Human
Intelligence Exhibited by Machines
Machine Learning — An Approach to
Achieve Artificial Intelligence
Deep Learning — A Technique for
Implementing Machine Learning
6. Products worked on
Recommendation Engine
Marketing automation
Products we will discuss
Fraud Detection for Referral
Recommendation Engine
NLP based sort for Product Feed
Image Recognition based product cleaning
Marketing automation
Purchase Predictor
Chatbot with Microsoft
7. RECOMMENDATION
User Based Clustering
Product Based Clustering
Cluster based Collaborative Filtering
What -
Recommend users product which they will be interested in, with a target to
increase Conversion Rate (View->Purchase).
How -
8. RECOMMENDATION
Infer user properties based on user parameters like - Purchase, View,
Price of Product, Category, Address, Device, Payment Method, Shops, Type
of Product etc.
Determine - Religion, SES, Occupation, Age, Gender, Interests, Marital
Status, Kids
Take explicit consent from user about data
Give user an option to opt out of the recommendation and data
processing
User Based Clustering
9. C2C platform so anyone can upload anything and put
description and categories.
Use various techniques -
Session Based clusters
Image Recognition
NLP on Product Title, Description, Category and
Sub Category
Why were merchants uploading bad image?
Why the products were incorrectly labeled?
Product Clustering
10. Things with similar attribute are going to be of similar interest to user.
Recommend something based on past similar data.
Can be product (Amazon), content (Netflix), or users (Facebook)
E.g. People who are seeing iPhone will be interested in iphone cover too
once they PURCHASE iPhone, because that's what other SIMILAR user did.
Session Based Recommendation
Cluster Based Collaborative Filtering
11. MARKETING AUTOMATION
Targeting right user with right product at right time through right
channel
Using the Recommender system above and Some more data -
Most active platform (App, Mobile Web, Desktop)
Most active channel (Email, SMS, Push Notification, Banner)
Most Active Time of day, week and month
Then integrate the product recommendation to these marketing channels
using APIs
What -
RESULT - ~26% increase in Conversion Rate and ~40% in CTR
12. CHALLENGES
Scale of data - 60M DAU and data in peta byte, cost is a
factor
Edge Case - Blocked the sensitive categories in Ramadan
and ended up showing a magazine with nudity. Social
Media Bashing
Seasonality - Festivals, Back to Schools,
Measurement - Have the goal and how you will measure it
in mind and discuss with team.
Variance - If you don't include new data then you will get
into spiral. Always have at least 30% variance.
C2C Marketplace - There is no standardization here.
Cultural aspects - Language, Trends, Life Style, Spending
Power
14. Five Vs of Data -
Volume - Broadly speaking, the amount of data that is being produced over any
given unit of time
Variety - The level of deviation within your data, which can have both positive and
negative effects depending on what it is you’re hoping to achieve
Velocity - A term referring to how quickly new data is produced. Velocity can also
allude to the concept of drift, or, how quickly data underlying a model can change
over time
Veracity - The accuracy of data that is being collected, a trait which can be affected
by faulty inputs, poor organization, or a variety ofs other factors
Value - A holistic measure based on all other underlying characteristics of data and
rooted in how likely the data is to help you reach your desired end state
EVALUATE THE DATA