Mais conteúdo relacionado Semelhante a Buzz Words Dunning Multi Modal Recommendations (20) Mais de MapR Technologies (20) Buzz Words Dunning Multi Modal Recommendations2. 2©MapR Technologies - Confidential
Multiple Kinds of Behavior
for Recommending
Multiple Kinds of Things
3. 3©MapR Technologies - Confidential
What’s Up
What is this multi-modal stuff?
A simple recommendation architecture
Some scary math
Putting it into a deployable architecture
Final thoughts
4. 4©MapR Technologies - Confidential
Contact:
– tdunning@maprtech.com
– @ted_dunning
– @apachemahout
– @user-subscribe@mahout.apache.org
Slides and such (available late tonight):
– http://www.slideshare.net/tdunning
Hash tags: #bbuzz #mapr #recommendations
5. 5©MapR Technologies - Confidential
Recommendations
Often known (inaccurately) as collaborative filtering
Actors interact with items
– observe successful interaction
We want to suggest additional successful interactions
Observations inherently very sparse
6. 6©MapR Technologies - Confidential
Examples of Recommendations
Customers buying books (Linden et al)
Web visitors rating music (Shardanand and Maes) or movies
(Riedl, et al), (Netflix)
Internet radio listeners not skipping songs (Musicmatch)
Internet video watchers watching >30 s (Veoh)
7. 7©MapR Technologies - Confidential
What is this multi-modal stuff?
But people don’t just do one thing
One kind of behavior is useful for predicting other kinds
Having a complete picture is important for accuracy
What has the user said, viewed, clicked, closed, bought lately?
8. 8©MapR Technologies - Confidential
A simple recommendation architecture
Look at the history of interactions
Find significant item cooccurrence in user histories
Use these cooccurring items as “indicators”
For all indicators in user history, add up scores
9. 9©MapR Technologies - Confidential
Recommendation Basics
History:
User Thing
1 3
2 4
3 4
2 3
3 2
1 1
2 1
10. 10©MapR Technologies - Confidential
Recommendation Basics
History as matrix:
(t1, t3) cooccur 2 times,
(t1, t4) once,
(t2, t4) once,
(t3, t4) once
t1 t2 t3 t4
u1 1 0 1 0
u2 1 0 1 1
u3 0 1 0 1
11. 11©MapR Technologies - Confidential
A Quick Simplification
Users who do h
Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
Item-centric recommendations
12. 12©MapR Technologies - Confidential
Recommendation Basics
Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
13. 13©MapR Technologies - Confidential
Problems with Raw Cooccurrence
Very popular items co-occur with everything
– Welcome document
– Elevator music
That isn’t interesting
– We want anomalous cooccurrence
14. 14©MapR Technologies - Confidential
Recommendation Basics
Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
t3 not t3
t1 2 1
not t1 1 1
15. 15©MapR Technologies - Confidential
Spot the Anomaly
Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
16. 16©MapR Technologies - Confidential
Root LLR Details
In R
entropy = function(k) {
-sum(k*log((k==0)+(k/sum(k))))
}
rootLLr = function(k) {
sign = …
sign * sqrt(
(entropy(rowSums(k))+entropy(colSums(k))
- entropy(k))/2)
}
Like sqrt(mutual information * N/2)
See http://bit.ly/16DvLVK
17. 17©MapR Technologies - Confidential
Threshold by Score
Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
18. 18©MapR Technologies - Confidential
Threshold by Score
Significant cooccurrence => Indicators
t1 t2 t3 t4
t1 1 0 0 1
t2 0 1 0 1
t3 0 0 1 1
t4 1 0 0 1
19. 19©MapR Technologies - Confidential
So Far, So Good
Classic recommendation systems based on these approaches
– Musicmatch (ca 2000)
– Veoh Networks (ca 2005)
Currently available in Mahout
– See RowSimilarityJob
Very simple to deploy
– Compute indicators
– Store in search engine
– Works very well with enough data
21. 21©MapR Technologies - Confidential
Virtues of Current State of the Art
Lots of well publicized history
– Musicmatch, Veoh, Netflix, Amazon, Overstock
Lots of support
– Mahout, commercial offerings like Myrrix
Lots of existing code
– Mahout, commercial codes
Proven track record
Well socialized solution
23. 23©MapR Technologies - Confidential
Too Limited
People do more than one kind of thing
Different kinds of behaviors give different quality, quantity and
kind of information
We don’t have to do co-occurrence
We can do cross-occurrence
Result is cross-recommendation
25. 25©MapR Technologies - Confidential
Symmetry Gives Cross Recommentations
Why just dyadic learning?
Why not triadic learning?Why not cross learning?
AT
A( )hBT
A( )h
26. 26©MapR Technologies - Confidential
For example
Users enter queries (A)
– (actor = user, item=query)
Users view videos (B)
– (actor = user, item=video)
A’A gives query recommendation
– “did you mean to ask for”
B’B gives video recommendation
– “you might like these videos”
27. 27©MapR Technologies - Confidential
The punch-line
B’A recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
28. 28©MapR Technologies - Confidential
Real-life example
Query: “Paco de Lucia”
Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
30. 30©MapR Technologies - Confidential
Hypothetical Example
Want a navigational ontology?
Just put labels on a web page with traffic
– This gives A = users x label clicks
Remember viewing history
– This gives B = users x items
Cross recommend
– B’A = label to item mapping
After several users click, results are whatever users think they
should be
36. 36©MapR Technologies - Confidential
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
37. 37©MapR Technologies - Confidential
Summary
Input: Multiple kinds of behavior on one set of things
Output: Recommendations for one kind of behavior with a
different set of things
Cross recommendation is a special case
39. 39©MapR Technologies - Confidential
Input Data
User transactions
– user id, merchant id
– SIC code, amount
– Descriptions, cuisine, …
Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
40. 40©MapR Technologies - Confidential
Input Data
User transactions
– user id, merchant id
– SIC code, amount
– Descriptions, cuisine, …
Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
Derived user data
– merchant id’s
– anomalous descriptor terms
– offer & vendor id’s
Derived merchant data
– local top40
– SIC code
– vendor code
– amount distribution
41. 41©MapR Technologies - Confidential
Cross-recommendation
Per merchant indicators
– merchant id’s
– chain id’s
– SIC codes
– indicator terms from text
– offer vendor id’s
Computed by finding anomalous (indicator => merchant) rates
42. 42©MapR Technologies - Confidential
Search-based Recommendations
Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
43. 43©MapR Technologies - Confidential
Search-based Recommendations
Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
44. 44©MapR Technologies - Confidential
Search-based Recommendations
Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
Sample query
– Current location
– Recent merchant descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
45. 45©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
46. 46©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
47. 47©MapR Technologies - Confidential
Contact:
– tdunning@maprtech.com
– @ted_dunning
– @apachemahout
– @user-subscribe@mahout.apache.org
Slides and such (available late tonight):
– http://www.slideshare.net/tdunning
Hash tags: #bbuzz #mapr #recommendations
We are hiring!
48. 48©MapR Technologies - Confidential
Objective Results
At a very large credit card company
History is all transactions, all web interaction
Processing time cut from 20 hours per day to 3
Recommendation engine load time decreased from 8 hours to 3
minutes
Recommendation quality increased visibly