This talk was given at the i-know 2013 and the IEEE TMC Chapter CE Meeting in November 2013. Authors are Andreas Oertl (frist author), Michael Heiss, Bettina Laenger, Barbara Kavsek. This time a more detailed presentation about a split test for the urgent request notification within Siemens TechnoWeb (and it's statistical significance analysis)
16. Techno Web Split Analysis:
Old versus new Template for Urgent Requests
Approach for sample selection:
0%
50%
100%
Urgent Request 1
NEW
OLD
Urgent Request t1
Before first-time
OLD
Urgent Request t2
UR 1 .. t1
Before
UR 1 .. t2
Receivers of NEW template will always
receive NEW further on.
Page 16
September 2013
Siemens CT TIM CEE
OLD
If >50% have already
received new
template more
„NEW“ than „OLD“
17. Statistical Questions
• Statistical Question to be answered by the analysis:
• Is there a difference in the number of responses
(views, comments) of the old versus new template?
• Do first-time users of the new template behave differently from
users that received the new template before?
• Requested for future analyses:
• Is there one representative number for the extent of this
difference, considering all urgent requests?
Page 17
September 2013
Siemens CT TIM CEE
18. Sample Characteristics
Dependency within 1 observation?
• Are we considering paired or unpaired samples?
- Paired sample means that 2 characteristics of one observation are dependent
- We want to compare responses (views, comments) to the same urgent
request for old versus new template.
- Thus, we have to consider pairs of responses and investigate the difference
between response ratios for each urgent request.
- Example:
click-through
ratio old
click-through
ratio new
Urgent request 1
0.01
0.03
Urgent request 2
0.03
0.05
Urgent request 3
0.07
0.01
Urgent request 4
0.05
0.07
Assuming independent samples assuming equal mean in old and new
template. BUT: In reality: ctrold < ctrnew in ¾ of requests!
We assume dependent samples paired test
Page 18
September 2013
Siemens CT TIM CEE
19. Sample Characteristics
Independency between observations?
The problem is that for most statistical tests, values between observations of the
sample (i.e. different urgent requests) have to be independent.
We know that the same person gets several urgent requests, however, it is
assumed that the response behavior (to click on the notification link) is
independent for different topics.
Thus we can assume independence of the different urgent requests.
click-through
ratio old
Urgent request 1
Page 19
September 2013
0.01
0.03
Urgent request 2
0.03
0.05
Urgent request 3
0.07
0.01
Urgent request 4
independent
click-through
ratio new
0.05
0.07
Siemens CT TIM CEE
dependent
20. Selection of Test Method
Comparison of means:
Is the mean response significantly different in the new template compared to the
old template?
• t-Test for paired samples
Premises:
- 2 paired samples (xi,yi) with expectation values
1
and
2
- Differences di=xi-yi normally distributed with expectation value .
Hypothesis: H0: d=0
• Wilcoxon-test for paired samples
- 2 paired samples (xi,yi) with expectation values
1
and
2
- Differences di=xi-yi symmetrically distributed fulfilled if xi and yi have the
same distribution shape.
Hypothesis: H0:
Page 20
1=
September 2013
2
Siemens CT TIM CEE
21. Check of premises
Before applying a hypothesis test, the differences (v0-v1 and c0-c1) have to be
tested on normal distribution.
Using the Kolmogoroff-Smirnoff test, we receive the following result:
H0: Variable has a normal distribution.
v1: click-through ratio new
variable
p-value
v1-v0
0.04558
c1-c0
0.002431
v0: click-through ratio old
c1: conversion rate new
c0: conversion rate old
=5% no normal distribution in both cases (views, comments)
Therefore we have to use a test which does not require normal distribution
Wilcoxon rank sum test.
Page 21
September 2013
Siemens CT TIM CEE
22. Check of premises
Symmetrical Distribution of differences v0-v1 and c0-c1:
Page 22
September 2013
Siemens CT TIM CEE
23. Hypothesis Test:
Principle of the Wilcoxon rank sum test
Wilcoxon rank sum test (U-test for paired samples):
Example for n=8
UR
v0
v1
dv=v1-v0
rank for dv>0
rank for dv<0
1
0.02
0.02
0
-
-
2
0.01
0
-0.01
3
0.01
0.10
0.09
7
4
0.06
0.13
0.07
6
5
0.03
0.04
0.01
1.5
6
0.11
0.15
0.04
5
7
0.06
0.08
0.02
3
8
0.03
0.06
0.03
4
1.5
R+ = 26.5
R=min(R+, R- )=1.5
Critical value for n=7 (UR1 excluded), =5%: Rcritical=2
R<Rcritical H0:
Page 23
September 2013
Siemens CT TIM CEE
1=
2 is
rejected
R- = 1.5
24. Test Results
Results of Wilcoxon rank sum test:
H0
p-value
v0= v1
1
9.076e-07
c0= c1
0.4616
c0>
c1
0.7718
c0< c1
Comments
v0> v1
v0< v1
Views
1.815e-06
0.2308
Test result:
Red: p<0.05 significant i.e. H0 is rejected.
v0> v1
More views using old template.
c0= c1
No significant change in number of
comments.
Possible explanations why there are more views of the old template:
- Link to urgent request better visible.
- Users used to old template.
- Already enough information in e-mail no need to view details.
- Subjective impression of full information in new template.
Page 24
September 2013
Siemens CT TIM CEE
25. Plots: response for old versus new template
.
Views in old (black)
versus new (red)
template
Comments in old
(black) versus new
(red) template
Page 25
September 2013
Siemens CT TIM CEE
26. Variable for comparison of old and new template
Click-through ratio and conversion ratio: Problem of exclusion of zero values.
Histogram: Conversion rate ratio
convnew/convold
5
12
Count of Urgent Requests with
corresponding ratio
Count of Urgent Requests with
corresponding ratio
Histogram: Click-through rate ratio
ctrnew/ctrold
10
8
6
4
2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Click-through rate ratio
4
3
2
1
0
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 2 >2
Conversion rate ratio
Page 26
September 2013
Siemens CT TIM CEE
27. Variable for comparison of old and new template
Using differences v1-v0 and c1-c0 instead of quotients v1/v0 and c1/c0
zero values do not have to be excluded.
Page 27
September 2013
Siemens CT TIM CEE
28. New First-Timers
• Considering only subgroup receiving the new template:
Is there a correlation between the number of “new first-timers” (NFT) and the
number of
(a) views (V1)?
(b) comments (C1)?
(a) H0: r(V1,NFT) = 0
(b) H0: r(C1,NFT) = 0
Kolmogoroff-Smirnoff test yields that number of new first-timers NFT is not
normally distributed (p= 7.936e-10) using Spearman„s or Kendall„s
correlation coefficient.
Page 28
September 2013
Siemens CT TIM CEE
29. New First-Timers
• Considering only subgroup receiving the new template:
Is there a correlation between the number of “new first-timers” (NFT) and the
number of
(a) views (V1)?
(b) comments (C1)?
Test results:
case
variables
(a)
V1,NFT
(a)
V1, NFT
(b)
C1,NFT
(b)
C1, NFT
method
r
Spearman 0.3939
Kendall
p-value
0.0017
0.2919
0.0015
Spearman 0.3976
0.0015
Kendall
0.3012
0.0019
H0 is rejected in every case (p<0.05).
significant positive correlation
Number of new first-timers related to number of views and comments:
The more new first-timers, the more views and comments.
Page 29
September 2013
Siemens CT TIM CEE
30. Sample Characteristics
Improvement suggestion for novel split test
Proposition of sample selection for next split test:
• Existing TechnoWeb users are randomly split into two equally sized groups A
and B.
• Every new TechnoWeb user is assigned group A or group B randomly with a
probability of 50% for each group.
• Group A always receives the old, group B always receives the new template.
• First time views don‟t have to be investigated separately by this
approach, because they are more clearly distinguished from the beginning.
Page 30
September 2013
Siemens CT TIM CEE
31. Results and Recommendations
The following results were obtained:
•
No significant change in number of comments in new versus old template.
•
More views in old than in new template.
•
The more users receiving the new template for the first time, the more views
and comments.
•
Statistically relevant number for comparison of old and new template:
R=min(R+, R- ). Critical R varies according to sample size.
Suggestions:
• Use v1-v0 and c1-c0, respectively, instead of v1/v0 and c1/c0, in order not to
exclude zero-answers.
• Sample selection: randomly choose 50% that always receive old template, 50%
that always receive new template and stick to that selection.
Page 31
September 2013
Siemens CT TIM CEE