Info.nl organized a knowledge session on Big Data on August 9. In this presentation founder Maurits Kaptein of PersuasionAPI talks on the Big Data challenges.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Big Data sessie Maurits Kaptein
1. Big Data @ PersuasionAPI
Maurits Kaptein
Co-founder / Chief Scientist Science Rockstars
www.persuasionapi.com
2. Big Data?
Big data is not really defined.
“Datasets that are larger than „common‟
machines can handle”
3. What I will and won’t talk about
Yes: What are the challenges that are
associated with big data
Yes: How did we solve them in PersuasionAPI
(high level)
No: Algorithms
No: Infrastructure / Technical details
4. 3 Key Challenges
• Focus on meaningful data
• So much data, but which is useful?
• Move from Analytics to Advice
• No reports in hindsight but direct responses
• Inability to run analysis on all of the data
• Need for summaries / online learning
10. Should we use all the strategies we
can think off?
At the same time?
For the same product?
11. Comparing many strategies with
single strategies
3000
2000
Density
500 1000
0
0.000 0.002 0.004 0.006 0.008 0.010
Click probability
12. Should we use all the strategies
we can think of?
No, we are better of selecting a
specific one.
13. Should we use the same strategies
for everyone?
Strategies not equally
effective for
everyone?
Large differences
based on personality
traits
14. 2 Scenarios:
Average
Average
Individuals Individuals
- + - +
Effect of using a strategy Effect of using a strategy
14
Beta Launch presentations Q2 2012 14
15. Should we use the same
strategies for everyone?
No, people are distinct in their
reactions to different strategies.
18. Choose not to produce reports after
logging responses…
But rather summarize all the data
to be available for direct
recommendations.
19. Persuasion Profile:
Normal Page:
A1 (Scarcity):
A2 (Authority):
A3 (Consensus):
Effect
•A persuasion profile is a collection of the
estimates of the effect of persuasion principles
for each individual user
19 19
Beta Launch presentations Q2 2012
20. We log the success of each attempt
Normal Page:
A1 (Scarcity):
A2 (Authority):
A3 (Consensus):
Effect
• Based on the dynamic image and the link we can monitor the
success of each page served to a user.
• We will keep updates of the average performance of your served
page variations, and of the performance for each client.
20 20
Beta Launch presentations Q2 2012
21. We improve the personal profile
Normal Page:
A1 (Scarcity):
A2 (Authority):
A3 (Consensus):
Effect
• Based on the response of each client we will update our advice for that user
• The new advice is a combination of the response of that client, as well as that of
other clients
21 21
Beta Launch presentations Q2 2012
22. User navigates, we improve
First page served: Second page served: Third page served:
Normal: Normal: Normal:
A1: A1: A1:
A2: A2: A2:
A3: A3: A3:
Effect Effect Effect
And so on, for each individual client...
Real time analytics is most effective in predicting
behavior
22 22
Beta Launch presentations Q2 2012
24. Example of adjusted page
1: Log Client ID (e.g. via
dynamic image, cookie, etc)
2. Link(s) to log success of
the Sales Strategy
3. Hooks to log non-
responsiveness to a Sales
Strategy
24 24
Beta Launch presentations Q2 2012
25. Challenge 2:
We provide “advice” stating which
Strategy to Use for your current
customer.
In between page views…
27. Problem 1: Impossible fitting to all
of the data in memory
Move fully to “online” learning:
Handle datapoint for datapoint
Do not focus on ( theta | data ) but rather on ( theta |
prior(s) )
• Summarize all meaningful info in the priors.
Find out what data you need and don’t need to make
an impact on the bottom line.
• E.g. no demographic data
Use M/R jobs for re-estimating
28. Problem 2: Individual level
estimates are needed fast
Use hierarchical models:
Aggregated level => Input for new users
User level => Start model for known users
Apply shrinkage
Link the two levels
Use user-level model in isolation if necessary
Analytical updates thus very fast.
29. Challenge 3:
How do we deal with all the data:
Use online learning and split
different levels of the model
30. Results
Slide with the
Increase inexample through:
towell email click
(at the 5th reminder)
>100%
Increase in e-commerce revenue: >25%
30
Beta Launch presentations Q2 2012 30
31. My Big Data considerations:
Focus on meaningful data: Persuasion at an
individual level.
Move from analytics to real time response:
Provide real-time advice
Inability to analyze all of the data: Use online
learning and hierarchical models.
32. End.
Thanks!
Contact us at:
maurits@sciencerockstars.com
+31 621262211
www.sciencerockstars.com
Notas do Editor
What is big data?Hype, but we don’t really compyHowever, there are somethingschaning , because we have so much data…
Briefly go through each of the six:Consensus (previous example)Liking (Similarity wallet example)Expertise (Milgram example)Commitment (Sign in garden example)Scarcity (Abundantly available example)Reciprocity (Free books example)
They are already used online (Scarcity, Concensus, Scarcity)
Talk through the two scenarios.
So for each user its an estimate of what works. Which then can subsequently be used to select content.