Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
DMAW Breakfast Seminar What to Test
3. ROUND 1:
DID COLOR MAKE A DIFFERENCE?
D M A W B R E A K F A S T S E M I N A R 3
4. ROUND 2:
DID BUTTON COPY, LOCATION, COLOR AND ROLL OVER
EFFECT MAKE A DIFFERENCE IN CLICK THROUGH?
D M A W B R E A K F A S T S E M I N A R
4
5. ROUND 3:
IS IT WORTH TESTING TWO DIFFERENT CALLING SCRIPTS?
D M A W B R E A K F A S T S E M I N A R
5
6. TODAY
• What makes a “Good Test”? (Testing 101)
• And here are some things not to test
• NOW - Pay attention to Tom Gaffny
D M A W B R E A K F A S T S E M I N A R
6
7. WHAT MAKES A “GOOD” TEST?
• Have a goal/objective: What is your hypothesis?
• e.g. “Asking for more money will yield a higher average gift”
• Ensure that results can be repeated!
• Create Statistically measurable segments
What does that mean?
D M A W B R E A K F A S T S E M I N A R
7
8. HAVE YOU HEARD THE ONE ABOUT 300
RESPONSES?
Smaller samples are reliable. I've never been able to find out where this rule of 300 responses comes from. I have never been able to find it in any statistics
book, and I have searched. I first heard of it when started at Adams Hussey.
(Before Adams Hussey, I worked at an epidemiological association. We never used a 300 minimum there. In fact, it was quite common to have extremely small
sample sizes there, e.g. in a unit with 8 patients, 3 became infected with the same strain of organism; was this infection rate within normal limits or due to an
external factor?)
After asking lots of questions about the rule of 300, I finally got an explanation from Greg Adams. He said that he and Hal Malchow had settled on needing 300
responses because they found when they rolled out with winning tests with smaller samples, the tests often did not perform as expected on roll out.
I started thinking about this, and about the way that Adams Hussey and most agencies evaluate tests. I realized that Adams Hussey evaluated tests based
solely on response rate. They did not factor in gift amounts, or more importantly, the distribution and variance of gift amounts. That made some sense, as at the
time, it was quite a chore to get individual gift data from clients. Without individual gift data, we couldn't test to see if the test and control had the same
distribution of gifts. Having at least 300 gifts ensured some stability in the gift data. Having 300 gifts allowed us to evaluate a test accurately without having to
evaluate the distribution of the gift amounts.
The number that I use is 25. That number you will find in statistics books. It comes from the Central Limit Theorem which says that when you have 25 samples
or readings, those samples take on a normal distribution. (I've also sometimes seen 30 used instead of 25. I'm comfortable with either.) Once we know that a
data set has a normal distribution, we can perform all sorts of calculations on that data set using the rules of normal distributions. (If you want to know more,
there is a great, 10 minute Kahn Academy presentation on the Central Limit Theorem here:
https://www.khanacademy.org/math/probability/statistics-inferential/sampling_distribution/v/central-limit-theorem
So, I like to use 25 as my minimum number of responses for evaluating a test. Keep in mind, if you are evaluating a small test, you should look at the distribution
of gift amounts to see if they are similar between the test and the control.
(In case you're curious about my epidemiology example, you can evaluate samples smaller than 25, you just can't use the rules of normal distributions to
evaluate them. Generally, we use something called the Fisher's Exact test to evaluate these small samples. I built a Fisher's Exact calculator in Excel while I
worked at the epidemiology association for members to use to evaluate their data.)
The way I would present the 300 response best practice is that it is a good general rule. It is particularly useful when you have limited data available
or you need to make a quick calculation. It is also easy for clients to understand, as it is just one rule and only relies on one metric -- response rate.
Looking at response rate and gift amount distribution is more complex.
I would mention this as a best practice, but also mention that it is entirely reasonable to evaluate smaller tests, but doing so requires more comfort with statistics.
10. DON’T TEST THAT! … ON THE PHONE
Don’t TEST every phone campaign
Testing TM & TM firms – does it even work for your organization?
Testing scripts in the same phone room on the same night – it
can be done but challenging
Testing a low volume program for a high volume short calling
time program or the reverse
D M A W B R E A K F A S T S E M I N A R
10
11. DON’T TEST THAT … SCRIPT TESTING
IT’S A PEOPLE BUSINESS!
A script is NOT DIRECT MAIL COPY
You can do A/B testing – separate nights/same callers…
Very challenging
What can you test in scripts?
Hold your horses… I’ll tell you soon.
D M A W B R E A K F A S T S E M I N A R
11
12. DON’T TEST THAT! ... DIGITAL
Small Sample Size
Low Volume Pages
D M A W B R E A K F A S T S E M I N A R
12
16. DON’T NOT TEST THIS! … MAIL ASK STRATEGIES
D M A W B R E A K F A S T S E M I N A R
16
17. TEST THIS! … “YOUR BEST GIFT”
MADE A DIFFERENCE
Test had higher response rate and average gift
Gross per thousand was 20% Higher!
D M A W B R E A K F A S T S E M I N A R
17
18. DON’T TEST ACROSS CHANNELS
“Your Best Gift” now being tested in all channels
On Donation Form
On Landing Page for eappeals
In script for calling
D M A W B R E A K F A S T S E M I N A R
18
19. TEST THIS! … PHOTOS/MESSAGING
D M A W B R E A K F A S T S E M I N A R
19
20. BE CAREFUL WHAT YOU ASK FOR
Ask for Direct Help
Response rate 28% Higher!
Average gift 19% Higher!
52% MORE gross revenue
D M A W B R E A K F A S T S E M I N A R
20
21. DON’T … REMEMBER THE BUTTON TEST…
No statistically significant results
Looking at what the users did after clicking on the different buttons
allowed Beaconfire to make more suggestions to improve content
and UI
D M A W B R E A K F A S T S E M I N A R
21
22. AND A FEW MORE THOUGHTS
Fight for a R&D BUDGET!
If you don’t try you will never succeed
Measure test impact one year out
Did the tested donors retain better? Worse?
Cultivate your donors
If it helps, “test” sending cultivation pieces for a year…
“Proof Of Concept is a Beautiful Thing”
D M A W B R E A K F A S T S E M I N A R
22