SlideShare uma empresa Scribd logo
1 de 48
1©MapR Technologies - Confidential
Polyvalent Recommendations
2©MapR Technologies - Confidential
Multiple Kinds of Behavior
for Recommending
Multiple Kinds of Things
3©MapR Technologies - Confidential
 Contact:
– tdunning@maprtech.com
– @ted_dunning
 Slides and such (available late tonight):
– http://www.slideshare.net/tdunning
 Hash tags: #mapr #recommendations
4©MapR Technologies - Confidential
A new approach to recommendation, polyvalent recommendation,
that is both simpler and much more powerful than traditional
approaches. The idea is that you can combine user, item and content
recommendations into a single query that you can implement using a
very simple architecture.
5©MapR Technologies - Confidential
Recommendations
 Often known (inaccurately) as collaborative filtering
 Actors interact with items
– observe successful interaction
 We want to suggest additional successful interactions
 Observations inherently very sparse
6©MapR Technologies - Confidential
Examples
 Customers buying books (Linden et al)
 Web visitors rating music (Shardanand and Maes) or movies (Riedl,
et al), (Netflix)
 Internet radio listeners not skipping songs (Musicmatch)
 Internet video watchers watching >30 s
7©MapR Technologies - Confidential
Dyadic Structure
 Functional
– Interaction: actor -> item*
 Relational
– Interaction ⊆ Actors x Items
 Matrix
– Rows indexed by actor, columns by item
– Value is count of interactions
 Predict missing observations
8©MapR Technologies - Confidential
Recommendation Basics
 History:
User Thing
1 3
2 4
3 4
2 3
3 2
1 1
2 1
9©MapR Technologies - Confidential
Recommendation Basics
 History as matrix:
 (t1, t2) cooccur 2 times, (t1, t4) once, (t2, t4) once
t1 t2 t3 t4
u1 1 0 1 0
u2 1 0 1 1
u3 0 1 0 1
10©MapR Technologies - Confidential
A Quick Simplification
 Users who do h
 Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
Item-centric recommendations
11©MapR Technologies - Confidential
Recommendation Basics
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
12©MapR Technologies - Confidential
Problems with Raw Cooccurrence
 Very popular items co-occur with everything
– Welcome document
– Elevator music
 That isn’t interesting
– We want anomalous cooccurrence
13©MapR Technologies - Confidential
Recommendation Basics
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
t3 not t3
t1 2 1
not t1 1 1
14©MapR Technologies - Confidential
Root LLR Details
 In R
entropy = function(k) {
-sum(k*log((k==0)+(k/sum(k))))
}
rootLLr = function(k) {
sqrt(
(entropy(rowSums(k))+entropy(colSums(k))
- entropy(k))/2)
}
 Like sqrt(mutual information * N/2)
15©MapR Technologies - Confidential
Spot the Anomaly
 Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
16©MapR Technologies - Confidential
Threshold by Score
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
17©MapR Technologies - Confidential
Threshold by Score
 Significant cooccurrence => Indicators
t1 t2 t3 t4
t1 1 0 0 1
t2 0 1 0 1
t3 0 0 1 1
t4 1 0 0 1
18©MapR Technologies - Confidential
Decomposition for Cooccurrence
 Can use SVD for cooccurrence
 But first one or two singular vectors just encode popularity …
ignore those
 VT projects items into concept space, V projects back into item
space
 Thresholding reconstructed cooccurrence matrix is another way to
get indicators
AT
A = USVT
( )
T
USVT
( )= VS2
VT
19©MapR Technologies - Confidential
What’s right
about this?
20©MapR Technologies - Confidential
Virtues of Current State of the Art
 Lots of well publicized history
– Netflix, Amazon, Overstock
 Lots of support
– Mahout, commercial offerings like Myrrix
 Lots of existing code
– Mahout, commercial codes
 Proven track record
 Well socialized solution
21©MapR Technologies - Confidential
What’s wrong
about this?
22©MapR Technologies - Confidential
Cross Occurrence
 We don’t have to do co-occurrence
 We can do cross-occurrence
 Result is cross-recommendation
23©MapR Technologies - Confidential
Fundamental Algorithmics
 Cooccurrence
 A is users x items, K is items x items
 Product has general shape of matrix
 K tells us “users who interacted with x also interacted with y”
K = AT
A
24©MapR Technologies - Confidential
Fundamental Algorithmic Structure
 Cooccurrence
 Matrix approximation by factoring
 LLR
K = AT
A
A » USVT
K » VS2
VT
r = VS2
VT
h
r =sparsify(AT
A)h
25©MapR Technologies - Confidential
But Wait ...
Does it have to be that way?
26©MapR Technologies - Confidential
But why not ...
Why just dyadic learning?
Why not triadic learning?Why not cross learning?
AT
A( )hBT
A( )h
27©MapR Technologies - Confidential
For example
 Users enter queries (A)
– (actor = user, item=query)
 Users view videos (B)
– (actor = user, item=video)
 A’A gives query recommendation
– “did you mean to ask for”
 B’B gives video recommendation
– “you might like these videos”
28©MapR Technologies - Confidential
The punch-line
 B’A recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
29©MapR Technologies - Confidential
Real-life example
 Query: “Paco de Lucia”
 Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
 Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
30©MapR Technologies - Confidential
Real-life example
31©MapR Technologies - Confidential
Hypothetical Example
 Want a navigational ontology?
 Just put labels on a web page with traffic
– This gives A = users x label clicks
 Remember viewing history
– This gives B = users x items
 Cross recommend
– B’A = label to item mapping
 After several users click, results are whatever users think they
should be
32©MapR Technologies - Confidential
But wait,
there’s more!
33©MapR Technologies - Confidential
Ausers
things
34©MapR Technologies - Confidential
A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2
35©MapR Technologies - Confidential
A1 A2
é
ë
ù
û
T
B1 B2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
B1 B2
é
ë
ù
û
=
A1
T
B1 A1
T
B2
A2B1 A2B2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
B1 A1
T
B2
A2B1 A2B2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
B1 A1
T
B2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
36©MapR Technologies - Confidential
Summary
 Input: Multiple kinds of behavior on one set of things
 Output: Recommendations for one kind of behavior with a
different set of things
 Cross recommendation is a special case
37©MapR Technologies - Confidential
Now again, without
the scary math
38©MapR Technologies - Confidential
Input Data
 User transactions
– user id, merchant id
– SIC code, amount
– Descriptions, cuisine, …
 Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
39©MapR Technologies - Confidential
Input Data
 User transactions
– user id, merchant id
– SIC code, amount
– Descriptions, cuisine, …
 Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
 Derived user data
– merchant id’s
– anomalous descriptor terms
– offer & vendor id’s
 Derived merchant data
– local top40
– SIC code
– vendor code
– amount distribution
40©MapR Technologies - Confidential
Cross-recommendation
 Per merchant indicators
– merchant id’s
– chain id’s
– SIC codes
– indicator terms from text
– offer vendor id’s
 Computed by finding anomalous (indicator => merchant) rates
41©MapR Technologies - Confidential
Search-based Recommendations
 Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
42©MapR Technologies - Confidential
Search-based Recommendations
 Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
43©MapR Technologies - Confidential
Search-based Recommendations
 Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
 Sample query
– Current location
– Recent merchant descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
44©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
45©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
46©MapR Technologies - Confidential
Objective Results
 At a very large credit card company
 History is all transactions, all web interaction
 Processing time cut from 20 hours per day to 3
 Recommendation engine load time decreased from 8 hours to 3
minutes
 Recommendation quality increased visibly
47©MapR Technologies - Confidential
 Contact:
– tdunning@maprtech.com
– @ted_dunning
 Slides and such (available late tonight):
– http://www.slideshare.net/tdunning
 Hash tags: #mapr #recommendations
 We are hiring!
48©MapR Technologies - Confidential
Thank You

Mais conteúdo relacionado

Mais procurados

Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
London data science
London data scienceLondon data science
London data scienceTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Strata new-york-2012
Strata new-york-2012Strata new-york-2012
Strata new-york-2012Ted Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 

Mais procurados (20)

Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
London data science
London data scienceLondon data science
London data science
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
London hug
London hugLondon hug
London hug
 
Strata new-york-2012
Strata new-york-2012Strata new-york-2012
Strata new-york-2012
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 

Semelhante a Polyvalent recommendations

Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryMapR Technologies
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in businessMapR Technologies
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San DiegoMapR Technologies
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersTed Dunning
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
Government GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply ChainGovernment GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply ChainNeo4j
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Neo4j
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelFrank Pfleger
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 

Semelhante a Polyvalent recommendations (20)

Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on Symmetry
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
Government GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply ChainGovernment GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply Chain
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 

Mais de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 

Mais de Ted Dunning (12)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Polyvalent recommendations

  • 1. 1©MapR Technologies - Confidential Polyvalent Recommendations
  • 2. 2©MapR Technologies - Confidential Multiple Kinds of Behavior for Recommending Multiple Kinds of Things
  • 3. 3©MapR Technologies - Confidential  Contact: – tdunning@maprtech.com – @ted_dunning  Slides and such (available late tonight): – http://www.slideshare.net/tdunning  Hash tags: #mapr #recommendations
  • 4. 4©MapR Technologies - Confidential A new approach to recommendation, polyvalent recommendation, that is both simpler and much more powerful than traditional approaches. The idea is that you can combine user, item and content recommendations into a single query that you can implement using a very simple architecture.
  • 5. 5©MapR Technologies - Confidential Recommendations  Often known (inaccurately) as collaborative filtering  Actors interact with items – observe successful interaction  We want to suggest additional successful interactions  Observations inherently very sparse
  • 6. 6©MapR Technologies - Confidential Examples  Customers buying books (Linden et al)  Web visitors rating music (Shardanand and Maes) or movies (Riedl, et al), (Netflix)  Internet radio listeners not skipping songs (Musicmatch)  Internet video watchers watching >30 s
  • 7. 7©MapR Technologies - Confidential Dyadic Structure  Functional – Interaction: actor -> item*  Relational – Interaction ⊆ Actors x Items  Matrix – Rows indexed by actor, columns by item – Value is count of interactions  Predict missing observations
  • 8. 8©MapR Technologies - Confidential Recommendation Basics  History: User Thing 1 3 2 4 3 4 2 3 3 2 1 1 2 1
  • 9. 9©MapR Technologies - Confidential Recommendation Basics  History as matrix:  (t1, t2) cooccur 2 times, (t1, t4) once, (t2, t4) once t1 t2 t3 t4 u1 1 0 1 0 u2 1 0 1 1 u3 0 1 0 1
  • 10. 10©MapR Technologies - Confidential A Quick Simplification  Users who do h  Also do r Ah AT Ah( ) AT A( )h User-centric recommendations Item-centric recommendations
  • 11. 11©MapR Technologies - Confidential Recommendation Basics  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
  • 12. 12©MapR Technologies - Confidential Problems with Raw Cooccurrence  Very popular items co-occur with everything – Welcome document – Elevator music  That isn’t interesting – We want anomalous cooccurrence
  • 13. 13©MapR Technologies - Confidential Recommendation Basics  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2 t3 not t3 t1 2 1 not t1 1 1
  • 14. 14©MapR Technologies - Confidential Root LLR Details  In R entropy = function(k) { -sum(k*log((k==0)+(k/sum(k)))) } rootLLr = function(k) { sqrt( (entropy(rowSums(k))+entropy(colSums(k)) - entropy(k))/2) }  Like sqrt(mutual information * N/2)
  • 15. 15©MapR Technologies - Confidential Spot the Anomaly  Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
  • 16. 16©MapR Technologies - Confidential Threshold by Score  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
  • 17. 17©MapR Technologies - Confidential Threshold by Score  Significant cooccurrence => Indicators t1 t2 t3 t4 t1 1 0 0 1 t2 0 1 0 1 t3 0 0 1 1 t4 1 0 0 1
  • 18. 18©MapR Technologies - Confidential Decomposition for Cooccurrence  Can use SVD for cooccurrence  But first one or two singular vectors just encode popularity … ignore those  VT projects items into concept space, V projects back into item space  Thresholding reconstructed cooccurrence matrix is another way to get indicators AT A = USVT ( ) T USVT ( )= VS2 VT
  • 19. 19©MapR Technologies - Confidential What’s right about this?
  • 20. 20©MapR Technologies - Confidential Virtues of Current State of the Art  Lots of well publicized history – Netflix, Amazon, Overstock  Lots of support – Mahout, commercial offerings like Myrrix  Lots of existing code – Mahout, commercial codes  Proven track record  Well socialized solution
  • 21. 21©MapR Technologies - Confidential What’s wrong about this?
  • 22. 22©MapR Technologies - Confidential Cross Occurrence  We don’t have to do co-occurrence  We can do cross-occurrence  Result is cross-recommendation
  • 23. 23©MapR Technologies - Confidential Fundamental Algorithmics  Cooccurrence  A is users x items, K is items x items  Product has general shape of matrix  K tells us “users who interacted with x also interacted with y” K = AT A
  • 24. 24©MapR Technologies - Confidential Fundamental Algorithmic Structure  Cooccurrence  Matrix approximation by factoring  LLR K = AT A A » USVT K » VS2 VT r = VS2 VT h r =sparsify(AT A)h
  • 25. 25©MapR Technologies - Confidential But Wait ... Does it have to be that way?
  • 26. 26©MapR Technologies - Confidential But why not ... Why just dyadic learning? Why not triadic learning?Why not cross learning? AT A( )hBT A( )h
  • 27. 27©MapR Technologies - Confidential For example  Users enter queries (A) – (actor = user, item=query)  Users view videos (B) – (actor = user, item=video)  A’A gives query recommendation – “did you mean to ask for”  B’B gives video recommendation – “you might like these videos”
  • 28. 28©MapR Technologies - Confidential The punch-line  B’A recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 29. 29©MapR Technologies - Confidential Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – “hombres del paco” times 400 – not much else  Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 30. 30©MapR Technologies - Confidential Real-life example
  • 31. 31©MapR Technologies - Confidential Hypothetical Example  Want a navigational ontology?  Just put labels on a web page with traffic – This gives A = users x label clicks  Remember viewing history – This gives B = users x items  Cross recommend – B’A = label to item mapping  After several users click, results are whatever users think they should be
  • 32. 32©MapR Technologies - Confidential But wait, there’s more!
  • 33. 33©MapR Technologies - Confidential Ausers things
  • 34. 34©MapR Technologies - Confidential A1 A2 é ë ù û users thing type 1 thing type 2
  • 35. 35©MapR Technologies - Confidential A1 A2 é ë ù û T B1 B2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú B1 B2 é ë ù û = A1 T B1 A1 T B2 A2B1 A2B2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T B1 A1 T B2 A2B1 A2B2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T B1 A1 T B2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
  • 36. 36©MapR Technologies - Confidential Summary  Input: Multiple kinds of behavior on one set of things  Output: Recommendations for one kind of behavior with a different set of things  Cross recommendation is a special case
  • 37. 37©MapR Technologies - Confidential Now again, without the scary math
  • 38. 38©MapR Technologies - Confidential Input Data  User transactions – user id, merchant id – SIC code, amount – Descriptions, cuisine, …  Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts
  • 39. 39©MapR Technologies - Confidential Input Data  User transactions – user id, merchant id – SIC code, amount – Descriptions, cuisine, …  Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts  Derived user data – merchant id’s – anomalous descriptor terms – offer & vendor id’s  Derived merchant data – local top40 – SIC code – vendor code – amount distribution
  • 40. 40©MapR Technologies - Confidential Cross-recommendation  Per merchant indicators – merchant id’s – chain id’s – SIC codes – indicator terms from text – offer vendor id’s  Computed by finding anomalous (indicator => merchant) rates
  • 41. 41©MapR Technologies - Confidential Search-based Recommendations  Sample document – Merchant Id – Field for text description – Phone – Address – Location
  • 42. 42©MapR Technologies - Confidential Search-based Recommendations  Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
  • 43. 43©MapR Technologies - Confidential Search-based Recommendations  Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40  Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40
  • 44. 44©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history
  • 45. 45©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history
  • 46. 46©MapR Technologies - Confidential Objective Results  At a very large credit card company  History is all transactions, all web interaction  Processing time cut from 20 hours per day to 3  Recommendation engine load time decreased from 8 hours to 3 minutes  Recommendation quality increased visibly
  • 47. 47©MapR Technologies - Confidential  Contact: – tdunning@maprtech.com – @ted_dunning  Slides and such (available late tonight): – http://www.slideshare.net/tdunning  Hash tags: #mapr #recommendations  We are hiring!
  • 48. 48©MapR Technologies - Confidential Thank You