2. About me
B.S. degree in Management Science and Ph.D. in
Statistics
Data scientist in twitter ads-ranking
HP Labs : pricing & portfolio management, marketing,
USDA : yield forecasting with satellite & survey data
Instructor at Colorado State University
Innovations in the intersection of statistics, computer
science and business
Applications in online advertising and e-commerce.
3. Interview experiences:
Google, LinkedIn, Apple, FB, twitter;
startup positions;
Phone screening (whiteboard coding, experiences,
technical problems at high level)
Business insights
Algorithm design
Analytics questions
Statistics : concepts and methods
Technical skills : SQL/R/Hadoop
Interview experiences
4. Business insight questions
Think of ways of monetizing on WhatsApp?
Propose a new Yelp feature?
How to grow twitter monthly active users (MAU)?
Recent news about the company & competitors
Sign up and use their core products
Keep informed of news in your interested industry
5. Analytical questions
Hypothetical problem to analyze & reason
Define metrics
Hypothesize possible reasons
Estimate effects of various factors
Suppose pinterest wants to associate a price tag with
each pin, how to evaluate whether this works?
6. Analytical : think like a data scientist
What are possible causes that average time spent per
user drops after this launch?
How do you find out the root cause(s)?
Finding pattern in data
Tease out signals from noise
Hypothesis testing
Time series analysis
Estimate the effects of various factors
Regression
A/B testing
Computer simulation
7. Statistics : concepts
Mean, median, mode, quantile.
Variance, range, IQR, covariance, correlation.
p-value, likelihood function.
How do you explain confidence interval to an engineer?
What’s Bayes theorem? Examples?
Gender-disease example.
Microsoft bing example: better new algorithm .1%, test significant
5% of time under H0, and 20% of time under alternative.
8. Statistics : methods
Show the same twitter ads to 2,000 users, and 15 users
clicked, CTR?
What is multi-colinearity?
What’s the difference between stratified & cluster
sampling?
Completely randomized block design? Give an example?
Signup completion on FB.
Book : statistical sleuth.
Coursera statistics classes.
9. Brain teaser
Suppose you are given an unfair coin, how can you get
probability of ½?
Put 50 black balls and 50 white balls into 2 buckets.
Randomly select a bucket and then a ball in the
bucket. How to maximize the prob of choosing black?
Given a fair die with 6 faces, how do you randomly
generate numbers between 1 and 12?
Programming: Rand6() -> Rand12().
10. Whiteboard coding (DS)
Suppose the amount we charge on ad clicks comes in a
stream. Write a program to calculate cumulative cost.
Correlation : (per ad cost, size of display)?
Write a program to simulate M:F ratio assuming each
family stops giving birth after the first boy (no twins).
CS coding & algorithm complexity: given an array of
numbers, find two that the sum is closest to 0?
Leetcode & cracking the coding interview
Practice with paper & pencil!
11. Data manipulations
Excel
SQL
Scripting language like R or python.
Dataset with (gender, age, time spent on FB per day)
Create age buckets: (0, 10], (10, 20], ...
Average time spent for each (gender, age bucket).
Table joins: inner join, left/right join, outer join (SQL & R)
12. Resources
Books
Statistical sleuth
Big data governance (quality, privacy, application in various verticals)
Data just right (DS)
the Startup of you
7 habits of highly effective people
Quora post : how to become a data scientist
Coursera classes
Intro to statistics
R programming
Machine learning
Intro to data science
Web intelligence and big data (DS)
glassdoor, careercup,...
13. Crawl twitter data in R (or python)
user info
user tweets
user network
Search results;
Text analytics (+ optional : predictive task)
Frequency of n-grams
Remove trivial words
Find associated entities
Visualization
Associative words
Sentiment
Volume
Interactive app : specify time window or specify entity, display raw
tweets, show volume by time
Twitter data analytics