The Anatomy of a Data Science Project

Irmak Sirer
@frrmack
irmaksirer.com

Irmak Sirer
@frrmack
irmaksirer.com
The Anatomy
of a Data Science
Project

AGE 7
Oh cool.
Pretty good. Space and stuff.

AGE 14
Omigod Omigod Omigod.
Epic masterpiece is epic!!!!1!
I'm in love with Leia.

AGE 30
When you think about it, it's not that good.

AGE 30
When you think about it, it's not that good.
Ah, who am I kidding? It's amazing.
I'm still in love with Leia.

What determines
how much I like a movie?

What determines
A personal question
on something
I am passionate about

How do I boost my sales?
A business question

Can I identify experts in each division
of my company and bring them
together to collaborate?
Another business question

What do customers out there
think and say
about my products?
Another business question

The Anatomy
of a
Data Science Project

Finding the right questions
Right metrics
Knowing what’s been done

Obsessed
with
Movies
Irmak
Sirer

Start with question, not data
Iterative design process
Moving targets

Find Data
Clean Data
Manage Data

Find Data
Clean Data
Manage Data
BIG DATA

Machine Learning
Statistics
Applied Math

Open source tools
Python
Pandas, Scikit.learn
SQL, Mongo
Javascript, d3, Flask
Hadoop, Spark, Hive, Mahoot

Interactive Dashboards
Easy to read graphs
Explaining well, adapting to audience

Interactive Dashboards
Easy to read graphs
Explaining well, adapting to audience
Ultimate Goal & Product: Insights

What determines
Is my reaction to a
movie / book / song
predictable?

How much will I like
The Book of Eli?

2006
Cinematch
1 billion
user ratings
55,000
movies

Cinematch
I have a soulmate in taste
Irmak

Cinematch
Irmak Frrmack

Cinematch
Watched the same movies
Irmak Frrmack

Cinematch
Gave the exact same ratings
Irmak Frrmack

Cinematch
Gave the exact same ratings
Except The Book of Eli
Irmak Frrmack

Cinematch
Frrmack watched The Book of Eli
Irmak Frrmack

Cinematch
Irmak Frrmack
Oh man, it was…

Cinematch
Irmak Frrmack
Oh man, it was…
FANTASTIC!

Cinematch
Irmak Frrmack
Oh man, it was…
FANTASTIC!
Predict

No perfect soulmates in real life
Irmak

Irmak
Almost soulmate 1

Irmak
Almost soulmate 1 Almost soulmate 2

Irmak
Almost soulmate 3

Irmak
Almost soulmate 4Almost soulmate 3

Irmak
87% soulmate 74% soulmate
95% soulmate82% soulmate

Irmak

Cinematch
Works well for movies that everybody rates

Cinematch
Quite bad with movies that only few people rate

Cinematch
Some movies are especially difficult to predict
Biggest error source: popular but weird
15% of all errors from ONE movie

Trivial: Mean score of everyone

Error: (RMSE) 1.0540 stars

Cinematch

Cinematch
9.6%

Cinematch
Better rankings  Better recommendations
9.6%

Cinematch
Better rankings  Better recommendations
+ 8.6%  + 1200% people watch top recommendation
9.6%
BigChaos Netflix Prize Repo

Cinematch
Error: 0.9525 stars
$1,000,000
for a 10% improvement
2006

Cinematch
Error: 0.9525 stars
Bring it down to:
Error: 0.8563 stars
$1,000,000
for a 10% improvement
2006

How did they do it?
Before:
Solid assumptions
You have a certain taste.
Your taste dictates a hidden rating for Book of Eli.
When you watch it, this rating is revealed to you.

How did they do it?
Before:
Solid assumptions
You have a certain taste.
Your taste dictates a hidden rating for Book of Eli.
When you watch it, this rating is revealed to you.
WRONG

How did they do it?
After:
Your rating changes with time.

How did they do it?
After:
It depends on...

How did they do it?
After:
It depends on...
how many you rated that day
your average rating for the day
which movies you rated on this day
shown Netflix prediction

Y. Koren, The BellKor Solution to the Netflix Grand Prize. 2009
Error: 1.0540 stars
Cinematch
Error: 0.9525 stars

Error: 1.0540 stars
Cinematch
Error: 0.9525 stars
Your time dependent rating tendencies

Error: 1.0540 stars
Cinematch
Error: 0.9525 stars
Error: 0.9278 stars

Error: 1.0540 stars
Cinematch
Error: 0.9525 stars
Error: 0.9278 stars
12.0%

Error: 1.0540 stars
Cinematch
Error: 0.9525 stars
Error: 0.9278 stars
without looking at which movies you like/hate!
12.0%

What does this suggest?
We cannot compare a movie with all others we've seen.

We compare it to a limited set.

Liking (real time & remembered) depends on time and
mood.

mood.
Other people's opinions affect our own (followers / hipsters)

We cannot compare Book of Eli with all movies we've seen.
mood.
Other people's opinions affect our own (followers / hipsters)

An experiment
Music Lab: A website for downloading music

An experiment
Same website: Music download and rating
M.J. Salganik, P.S. Dodds, D.J. Watts. Science, 311:854-856, 2006

An experiment
Alternative A:
Other people's ratings invisible

An experiment
Alternative A:
More or less equal ratings

An experiment
Alternative A:
Alternative B:
All ratings visible

An experiment
Alternative A:
Alternative B:
All ratings visible
Several songs snowball in popularity

An experiment
Alternative A:
Alternative B:
All ratings visible
Several songs snowball in popularity
It's different songs for each trial

Social influence plays a big part in determining hits and misses

Problems with rating movies
mood.
Other people's opinions affect our own.

Degree of liking is
sensitive and vague
Amazing! Total
garbage
Tuesday 3am Sunday 12pm

mood.
Other people's opinions affect our own.
Degree of liking is
sensitive and vague

Degree of liking is
sensitive and vague
Dependent on many other
environmental factors
besides our taste

Degree of liking is
sensitive and vague

Degree of liking is
sensitive and vague
Difficult to describe
accurately and consistently
with a number

Predicting aside,
can I even reliably rate & rank movies
I’ve seen in terms of enjoyment?

Irmak Frrmack
What are your
top twenty
movies?

Irmak Frrmack
Well…
Ummm…
What are your
top twenty
movies?

Irmak Frrmack
Well…
Ummm…
I like Star Wars.
What are your
top twenty
movies?

Degree of liking is
sensitive and vague
Can’t we do
something
about this?

Degree of liking is
sensitive and vague

“Enjoyment” from a movie is very
high dimensional information

“Enjoyment” from a movie is very
high dimensional information
Rating means projecting this onto a single
dimension

But sometimes you just want to do the
best projection you can
What is my top twenty?

Trying to rate Star Wars
Map enjoyment
to a specific scale
1

choose corresponding rating
for this degree of liking
2

But we cannot keep
this entire history of
enjoyment in mind

But we cannot keep
enjoyment in mind
We fuzzily remember
a small subset

But we cannot keep
enjoyment in mind
We fuzzily remember
a small subset
We map based on this subset

We can certainly handle
single comparisons
?

single comparisons

single comparisons
less vague

single comparisons
little information

I can manually compare it with all others

And find exactly where it belongs
right after Indiana Jones
right before The Princess
Bride

Full ranking: Compare all pairs

That’s a bit
too much effort
for me
1,000,000 comparisons?

We don’t need all of them
If

If
,

If
,
I have some information about

Compare a random sample of pairs

Use a ranking algorithm that utilizes
all the information
Good idea!

Elo rating system
7.00
“hotness”

Elo rating system
7.00
“hotness” range
+1.50-1.50

Elo rating system
7.00 8.00
+1.50-1.50 +1.50-1.50

Elo rating system
7.00 8.00
+1.50-1.50 +1.50-1.50
7.12 7.68

Elo rating system
7.00 8.00
7.12 7.68
+1.50-1.50 +1.50-1.50

Elo rating system
7.00 8.00
+150-150 +150-150
36%
to win
64%
to win

Elo rating system
How do we find out what these ranges are?

Elo rating system
Start with the same guess for every contender
5.00 5.00 5.00 5.00 5.00 5.00

Elo rating system
5.12 4.88
Update the best guesses accordingly

We don’t need all comparisons
If
,
I have some information about

Elo rating system
7.61 4.02
?
89%
to win
11%
to win

Elo rating system
7.61
+.02
4.02
-.02
89%
to win
11%
to win

Elo rating system
7.61
-.53
4.02
+.53
89%
to win
11%
to win

Elo rating system
We now have scores on a single scale
9.07 8.42 6.40 4.88 4.20 3.03

Elo rating system
We now have scores on a single scale
(estimates of people’s appreciation levels)
9.07 8.42 6.40 4.88 4.20 3.03

Elo rating system
and a ranking
1 2 3 4 5 6
9.07 8.42 6.40 4.88 4.20 3.03

Degree of liking is
sensitive and vague
Can we somehow apply
this to movies, then?

We can do better
Bayesian ranking algorithms

We can do better
Glicko
(The Elo Killer)
1999

We can do better
Glicko
(The Elo Killer)
1999
TrueSkill™
2007

Bayesian ranking
4.46 4.01
+- +-

Bayesian ranking
4.46 4.01
+- +-
82%
to win
15%
to win
3%
to draw

Bayesian ranking
? 4.3
Elo:
Best guess
for the center

Bayesian ranking
? 4.3
Bayesian:
It could be
centered around

Bayesian:
It could also be
centered around
Bayesian ranking
? 4.2

Bayesian:
or
centered around
Bayesian ranking
? 4.4

Bayesian:
Less likely
but even around
Bayesian ranking
? 4.5

Bayesian ranking
? 4.3
3.5 4 4.5 5
Probability

Bayesian ranking
? 4.3
3.5 4 4.5 5
Probability
uncertainty

Few comparisons: Lots of uncertainty
(anything from 2.3 to 4.5 is quite possible)
2.0 2.5 3.0 3.5 4 4.5
5
Probability

After many comparisons: Quite sure
(pretty much between 4.11 to 4.18)
Probability
2.0 2.5 3.0 3.5 4 4.5
5

Bayesian ranking
Star
Wars
Lord of
the Rings
2.0 3.0 4.0 5.0

How did they do it?
After:
A small, constant increase
in uncertainty before each
comparison
3.5 4 4.5 5
Probability
uncertainty

Degree of liking is
sensitive and vague
Great! We have a system!

I don’t want to
spend too much
time on this
How many is too many?

Minimum Effort
Maximum Information

Minimum Effort
Maximum Information
1 3 1 3 1 3 1 3 1 3

Minimum Effort
Maximum Information
Not reliable by itself
Still carries a lot of information

Minimum Effort
Maximum Information
1 3 5

Minimum Effort
Maximum Information
1 3 5 1 3 5

I don’t want to
spend too much
time on this
What else can we do?

Minimum Effort
Maximum Information
?

Minimum Effort
Maximum Information
?
I can calculate the expected amount
of information from a comparison!

Minimum Effort
Maximum Information
Certain about both movies
Won’t learn a lot

Minimum Effort
Maximum Information
Certain about both movies
Won’t learn a lot
Don’t know much about either
Will learn a lot
regardless of outcome

Python
Trueskill
Django
Javascript
MySQL

Quantifying human reactions are hard
books
songs
food
politicans
products
celebrities
tv shows
importance of issues
what to spend ‘fun’ budget on
teams in different sports

Start with a rating,
pose the correct comparisons

Start with a rating,
pose the correct comparisons
Every decision gets us closer

Many comparisons for a movie
over different days
averages out mood and other factors

The Anatomy of a Data Science Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to The Anatomy of a Data Science Project

Similar to The Anatomy of a Data Science Project (20)

Recently uploaded

Recently uploaded (20)

The Anatomy of a Data Science Project

Editor's Notes