Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
Mention that the Pony book said “RowSimilarityJob”…
Talk track:
Apache Mahout is an open-source project with international contributors and a vibrant community of users and developers. A new version – 0.8 – was recently released.
Mahout is a library of scalable algorithms used for clustering, classification and recommendation. Mahout also includes a math library that is low level, flexible, scalable and makes certain functions very easy to carry out.
Talk track: First let’s make a quick comparison of the three main areas of Mahout machine learning…
Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
Talk track: Solr is a small data tool that has flourished in a big data world
*For the hands-on lab in this course, we will use a free part of LucidWorks that comes as part of the MapR distribution
Needed???
Optional: don’t spend time on this
Note to speaker: fast on this slide as overview and reference only; additional slides and labs will explain this
TED: Do you need to talk about term locations? I left off SEE NEXT SLIDE FOR MORE ON FACETTING
FOR HOW TO SET THEM UP, is it done from dashboard (see next slide) or as a command?
Nice to show but don’t have to … they can just find it in the lab. I would show this slide very briefly and move on
Skip this?
Note to speaker: point out that using Solr is a state-of-the-art approach that simplifies deploying recommender
Talk track: We built a real music recommender on MapR and deployed it to a website for a mock company, Music Machine. Everything worked except you didn’t really hear music play…
Talk track: Here are documents for two different artists with indicator IDs that are part of the recommendation model.
When recommendations are needed, the web-site uses recent visitor behavior to query against the indicators in these documents.
Notes to trainer: A lot of work to do a grid. Represent by math
A is history matrix
Ah finds users who do the same things as in h
H is vector of items for one (new current) user
A transpose times Ah gives you the things
That computes what these users do
Shape of matrix multiplications and many of the same properties. Sometimes have weights etc. Had they been exactly the same, we could just move the parentheses.
Our recommender does the item-centric version
General relationships in data don’t change fast (what is related to what; nothing happens to change mozart related to Hayden overnight. )
What does change fast is what the user did in the last five minutes.
//in first case, we have to compute Ah first. Inputs to that compution (h) only available now, in RT so nothing can be computed ahead of time
Second case (Atranspose A) only involves things that change slowly. So pre-compute. Makes it possible to do this offline. Significant because we move a lot of computation for all users into an overnight process. So each RT recommendation involves only a small part, only 1 big matrix multiply in RT. Result: you get a fast response for the recommendations
Second form runs on one machine for one user (the RT part)
A lot of work to do a grid. Represent by math
A is history matrix
Ah finds users who do the same things as in h
H is vector of items for one (new current) user
A transpose times Ah gives you the things
That computes what these users do
Shape of matrix multiplications and many of the same properties. Sometimes have weights etc. Had they been exactly the same, we could just move the parentheses.
Our recommender does the item-centric version
General relationships in data don’t change fast (what is related to what; nothing happens to change mozart related to Hayden overnight. )
What does change fast is what the user did in the last five minutes.
//in first case, we have to compute Ah first. Inputs to that computation (h) only available now, in RT so nothing can be computed ahead of time
Second case (A transpose A) only involves things that change slowly. So pre-compute. Makes it possible to do this offline. Significant because we move a lot of computation for all users into an overnight process. So each RT recommendation involves only a small part, only 1 big matrix multiply in RT. Result: you get a fast response for the recommendations
Second form runs on one machine for one user (the RT part)
A lot of work to do a grid. Represent by math
A is history matrix
Ah finds users who do the same things as in h
H is vector of items for one (new current) user
A transpose times Ah gives you the things
That computes what these users do
Shape of matrix multiplications and many of the same properties. Sometimes have weights etc. Had they been exactly the same, we could just move the parentheses.
Our recommender does the item-centric version
General relationships in data don’t change fast (what is related to what; nothing happens to change mozart related to Hayden overnight. )
What does change fast is what the user did in the last five minutes.
//in first case, we have to compute Ah first. Inputs to that computation (h) only available now, in RT so nothing can be computed ahead of time
Second case (A transpose A) only involves things that change slowly. So pre-compute. Makes it possible to do this offline. Significant because we move a lot of computation for all users into an overnight process. So each RT recommendation involves only a small part, only 1 big matrix multiply in RT. Result: you get a fast response for the recommendations
Second form runs on one machine for one user (the RT part)
Problem starts here…
Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
This is a diagnostics window in the LucidWorks Solr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.
In other words, do these indicator artists represented by their indicator Id make reasonable recommendations
Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?