Building better products through Experimentation - SDForum Business Intelligence SIG

Building better products through Experimentation

Deepak Nadig, eBay Principal Architect

SDForum Business Intelligence SIG
March 27, 2008

What we’re up against
• eBay manages …
– Over 276,000,000 registered users
– Over 1 Billion photos

– eBay users worldwide trade more than $2039
worth of goods every second
– eBay averages well over 1 billion page views
per day
– At any given time, there are over 113 million
items for sale on the site
– eBay stores over 2 Petabytes of data – over
200 times the size of the Library of Congress!
– eBay analytics processes over 25 Petabytes
of data on any day
– The eBay platform handles 4.4 billion API calls
per month

An SUV good every 5 minutes
A sportingis soldsells every 2 seconds

Over ½ Million pounds of
Kimchi are sold every year!

• In a dynamic environment
– 300+ features per quarter
– We roll 100,000+ lines of code every two weeks
• In 39 countries, in seven languages, 24x7

>44 Billion SQL executions/day!
2

Site Statistics: in a typical day…

June
1999

Q1
2007

Growth

Outbound Emails

1M

41 M

41x

Total Page Views

54 M

>1 B

19x

16 Gbps

59x

0

150 M

N/A

~97%

99.94%

50x

43 mins/day

50 sec/day

Peak Network Utilization
API Calls
Availability

3

268 Mbps

Velocity of eBay -- Software Development Process

276M Users

300+ Features
Per Quarter

99.94%

100K LOC/Wk

6M LOC

• Our site is our product. We change it incrementally through implementing new features.
• Very predictable development process – trains leave on-time at regular intervals (weekly).
• Parallel development process with significant output -- 100,000 LOC per release.
• Always on – over 99.94% available.

All while supporting a 24x7 environment
4

James Lind and cure for scurvy

cider

5

elixir of
vitriol

sea water

garlic
vinegar
mustard
horseradish

orange
lemon

Reminder for data/analytics driven decisions

• Auction vs. Stores
• Combined search results
– Return a broader mix of inventory
– Listings of core + stores were combined
– More exposure to store listings

• Results
– Business metrics were down – bids, average sales price, etc.
– Latency in discovering this

• Analysis
– Overall cost of a store listing is less than that of auction listing
– Sellers shifted inventory to save on fees

• Rolled back in 03/2006
– Higher fees for store listings

6

Many Insights Methods (By Data Source vs. Approach)
Focus Groups / “Voices”

Desirability studies

Exit Surveys

Phone Interviews

Self-reported
(stated)

Cardsorting

Product Tracker

Diary/Camera Study

Message Board Mining

DATA SOURCE

(Onsite interviews)

“Visits” / Ethnographic Field Studies

mixture

Intent Discovery

Usability Lab Studies (task-based)
(Extended observation)

/

Quantitative user experience assessments

Usability benchmarking (in lab)

Observed
Behavior

/

Data mining
Eyetracking

Experimentation
Clickstreams

Qualitative (direct)

APPROACH

Quantitative (indirect)

KEY – Context of data collection with respect to product use
7

De-contextualized / not using product
Scripted or lab-based use of product

Natural use of product
Combination / hybrid

Concepts
• Unit (of experimentation, analysis)
– Entity on whom the experimentation or analysis is being made
– e.g. user, seller, buyer, item

• Factor (or variable)
– Something that can have multiple values
– Independent or controlled (cause), Dependent or response (effect)

• Treatment (or experience)
– A variation of information (e.g. page flow, page, module) served to the unit. The
variation is characterized by change in one or more factors or variables

• Sample
– A group of users who are served the same treatment.

• Evaluation Metric
– A metric used to compare the response to different treatments

• Experimentation
– A method of comparing 2 or more treatments based on measurable metric. One
variant, the status quo, is referred as the ‘control’.
8

Treatment (or experience)

• Module
– Strict subset of the page
– User is treated to changes to a module
– For e.g. zebra vs. integrated vs. distinct ads

• Page
– User is treated to different variations of the page
– For e.g. 2L1R (Left column is twice as wide as right) vs. 1L2R

• Page Flow or Use Case
– User is treated to different variations of a use case
– For e.g. different flows for listing an item for sale

9

Sampling

• Population
– Group you want to generalize to
People

• Sample
– Units from the population selected

• Sampling
– Process of selecting units from a population of interest
– By studying the sample you can fairly generalize the
results to the population

• External validity (Generalizability)
• Mechanisms
– Random
– Stratified random
– …

• What matters is number of samples
10

Time

Place

Setting

Experiments

• A/B testing
– A form of testing in which two treatments, a control (‘A’) and variant (‘B’) are
compared.
– No emphasis on cause (factor)

• Single-factor testing
– A form of testing in which treatments corresponding to values of a single-factor
are compared
– For e.g. Ad – Yes/No

• Multi-factorial testing (DOE)
– A method of testing in which treatments corresponding to multiple-values of
multiple-factors are compared
– For e.g. Ad – Yes/No, Location – Top/Bottom
– Manual vs. Automated

11

Objective

• To explore relationship between factors
• Relationships
– None
– Co-relational, Synchronized
• Positive vs. Negative
• Third-variable problem

– Causal relationship

• Establishing causal relationship
– If X, then Y
– If not X, then not Y

• Distinguish significant factors and interactions
• Measure impact on the metric

12

Experiment Lifecycle

•Metrics
•Reporting

•Idea (!)
•Learning

7. Analysis &
Results
1. Hypothesis

•Tracking
5. Measurement
•Monitoring

4. Launch
Experiment

•User
(Experiment, Treatment)
•Serve Treatment

13

eBay
Experimentation
Platform

2. Experimental
Design

•DOE
•Define Samples,
Treatments, Factors

3. Setup
Experiment

•Setup Experiment
Samples
Treatments, Factors
•Implementation

Reduce Email Guessing

• Purpose
– Measure decline in registrations from introduction of blocking message
– Users cannot create username which equals email address
– E.g. Username: cooky1 Email: cooky1@gmail.com

• Metrics
– Number of registrations
– Reduction in phishing

• Samples
– 3% US

• Treatments
– Classic, Blocked

• Outcome
– No difference in registrations
– Improved security

14

Text Ads on SRP

• Purpose
– Determine whether the use of text
ads on search result pages

• Metrics
– Overall revenue

• Samples
– 1% US, International

• Treatments
– Ad, No-ad

• Outcome
– Overall revenue increased in
certain markets

15

Home Page

• Purpose
– Optimal construction of page
– Per user segment?

• Metrics
– Overall revenue

• Samples
– Varied per treatment

• Treatments
– 100s of variations
– Ads, Merchandising, P13N,
Navigation, Layout

• Outcome
– Page structures different for
different user segments

16

What we think about

Fidelity of Experiments

The quality of the model and its testing conditions in representing
the final feature or product under actual use conditions

Cost of Experiments

The total cost of designing, building, running, and analyzing an
experiment

Iteration time

The time from planning experiments to when the analyzed results
are available and used for planning another iteration

Concurrency

The number of experiments that can be run at the same time

Signal/Noise Ratio

The extent to which the signal (response) of interest is obscured
by noise

Type/Level of Experiment

Types and Levels of experiment that can be carried out

17

Experimentation Platform

Access

ebay.com

eBay user

Experimenter
Access Page,
Module

Design

Experimentation
Service

Results
Observations

Experience

Finding
Finding
Finding
Selling
Selling
Selling
Buying
Buying
Buying

Experiment
Metadata

Experiment
Lifecycle
Management

Experience
Response

Message Bus
Results
Observations

Analysis

18

Metrics /
Experience

Experiences
Responses

Alert
Listener

Data
Cube

File Log

Implementation Considerations

• User identification
• User
–
–
–
–
–

Sample

No bias towards any experiment or treatment
Sticky-ness between activities (and sessions)
No interaction between experiments
Enabling a user to try out a specific treatment
Ramping-up to understand generalization effects

• Sample

Treatment

– No bias

• Splitting traffic
–
–
–
–

Inline
Application server
Load balancer
Browser

• Factor-driven development

19

Measurement – A case of traveling shoppers

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Page-1

Alice

Alice

Bob

Charlie

Bob

Charlie

Alice

3

Page-2

Bob

Alice

Bob

Alice

Bob

Bob

Alice

2

2

1

1

2

1

2

1

3

20

Limitations and ways to overcome them …

• Sticky-ness to user
– Session-level analysis

• What, not why
– When qualitative research complements

• Short-term vs. Long-term effects
– Think about the duration of the experiment

• Newness effect
– Consider burn-in periods

• Minor vs. major differences
– Think about amount of effort being committed

• Anonymity of tests
– When qualitative research spills the beans

21

Key takeaways

• Experimentation is one of the most effective approaches for gaining
quantitative insights
• Enables businesses to quickly understand and establish relationships
between product changes and their impact on business metrics
• Different types and levels of experiments can be used to gain different
amounts of insights
• Experimentation has limitations, but they can be overcome
• Think about “experiment-ability”, as one another “-ability” in product design

22

Experimentation Confirms Innovation

dnadig@ebay.com

23

Building better products through Experimentation - SDForum Business Intelligence SIG

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (10)

Destaque

Destaque (6)

Semelhante a Building better products through Experimentation - SDForum Business Intelligence SIG

Semelhante a Building better products through Experimentation - SDForum Business Intelligence SIG (20)

Mais de Deepak Nadig

Mais de Deepak Nadig (10)

Último

Último (20)

Building better products through Experimentation - SDForum Business Intelligence SIG