SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Content Recommendations
with Redis



Torben Brodt
plista GmbH

28. February 2013

Recommender Systems
Stammtisch

http://recommenders.de
Introduction
● plista GmbH
  ○ recommendations & advertising
  ○ founded in 2008, Berlin [DE]
  ○ ~3k recommendations/ second


● never batch = never Hadoop
● stream computing with In Memory Database

● we love
How to build recommendations?
welt.de/football/berlin_wins.html


                          We only have the URL?

                          to show recommendations
                          we are integrated on the
                          website

                          so "at least" we can count
                          the hits
Most popular
welt.de/football/berlin_wins.html
● ZINCR "p:welt.de" berlin_wins
● ZREVRANGEBYSCORE
                               p:welt.de
                               berlin_wins      689 +1
                               summer_is_coming 420
                               plista_company   135

  Live Read
+ Live Write
= Real Time Recommendations
Most popular with timeseries
welt.de/football/berlin_wins.html
● ZINCR "p:welt.de:1360007000" berlin_wins
● ZUNION
  ○ "p:welt.de:1360007000"
  ○ "p:welt.de:1360006000"
  ○ "p:welt.de:1360005000"
● ZREVRANGEBYSCORE
                             p:welt.de:1360005000
                              p:welt.de:1360006000
                             berlin_wins          420
                                 p:welt.de:1360007000
                              berlin_wins          420
                             summer_is_coming     135
                                 berlin_wins
                              summer_is_coming     135 689
                             plista_best_company 689
                              plista_best_company 689 420
                                 summer_is_coming
                                plista_best_company   135
Most popular with timeseries
welt.de/football/berlin_wins.html
● ZINCR "p:welt.de:1360007000"       berlin_wins
● ZUNION ... WEIGHTS
  ○ "p:welt.de:1360007000" ..        4
  ○ "p:welt.de:1360006000" ..        2
  ○ "p:welt.de:1360005000" ..        1
● ZREVRANGEBYSCORE
                             p:welt.de:1360005000
                              p:welt.de:1360006000
                             berlin_wins          420
                                 p:welt.de:1360007000
                              berlin_wins          420
                             summer_is_coming     135
                                 berlin_wins
                              summer_is_coming     135 689
                             plista_best_company 689
                              plista_best_company 689 420
                                 summer_is_coming
                                plista_best_company   135
Most popular with timeseries
         :1360007000




                                                             :1360007000
                                   :1360007000




   -1h                 -2h   -3h                 -4h   -5h                 -6h   -7h   -8h
Most popular to any context
● it's not only publisher, we use ~50 context
  attributes      publisher = welt.de
                                         weekday = sunday
                      berlin_wins      689 +1
                                         berlin_wins      400 +1
                      summer_is_coming 420
                                         dortmund_wins    200
                      plista_company   135
                                         ...              100
context attributes:
● publisher                 geolocation = dortmund
● weekday                   dortmund_wins     200
● geolocation
● demographics              berlin_wins       10     +1
● ...                       ...               5
Most popular to any context
● how it looks like in Redis
ZUNION ... WEIGHTS           publisher = welt.de
p:welt.de:1360007 4
p:welt.de:1360006 2                      weekday = sunday
                             berlin_wins          689 +1
p:welt.de:1360005 1                      berlin_wins      400
                             summer_is_coming 420
w:sunday:1360007   4                     dortmund_wins    200
                             plista_company       135
w:sunday:1360006   2
w:sunday:1360005   1                     ...              100

g:dortmund:1360007 4   geolocation = dortmund
g:dortmund:1360006 2
g:dortmund:1360005 1   dortmund_wins      200
                       berlin_wins        10
                       ...                5
Most popular with Effect size
● which context has an influence?
ZUNION ... WEIGHTS
p:welt.de:1360007 4 * 70%
p:welt.de:1360006 2 * 70%
p:welt.de:1360005 1 * 70%
                                     Examples:
w:sunday:1360007   4 * 10%           small effect: weather
w:sunday:1360006   2 * 10%           big effect: publisher
w:sunday:1360005   1 * 10%
                                     Data with small effect
g:dortmund:1360007 4 * 30%           should not been taken
g:dortmund:1360006 2 * 30%           into account, otherwise
g:dortmund:1360005 1 * 30%           we get avg results


                       Effect Size
Most popular with Significance
● some data has more significance/trust
● so we add a significance matrix

            publisher = welt.de             sig:publisher = welt.de

            berlin_wins           689       berlin_wins           1

            summer_is_coming      420
                                        X   summer_is_coming      1

            plista_company        135       plista_company        0.5




● Significance might depend on a common limit,
  like 200 (in the example)
Most popular with Significance
● some data has more significance/trust
● so we add a significance matrix
SUM over all context




      Σ(                                                                                  )
                         publisher = welt.de                    sig:publisher = welt.de

                         berlin_wins            689             berlin_wins           1

                         summer_is_coming       420
                                                         X      summer_is_coming      1

                         plista_company         135             plista_company        0.5
                                                                                             Numerator

                   SUM over all context        sig:publisher = welt.de                      Denominator



                         Σ
                                               berlin_wins           1

                                               summer_is_coming      1

                                               plista_company        0.5
SUM over..
                                           ZUNION ... WEIGHTS
●   timeseries                             p:welt.de:1360007 4
●   different context                      p:welt.de:1360006 2
●   previous hits of the user              p:welt.de:1360005 1
●   similar publisher                      w:sunday:1360007         4
    knowledge                              w:sunday:1360006         2
                                           w:sunday:1360005         1




Σ
                                           g:dortmund:1360007 4
               publisher = welt.de
                                           g:dortmund:1360006 2
               berlin_wins           689   g:dortmund:1360005 1
               summer_is_coming      420

               plista_company        135   ... redis can do it ;)
Even more Matrix Operations ;)
● Similarity Matrix
● Human Control Matrix
                                         Σ
● Meta-learning Matrix
   ○ might be covered in next talk
   ○ cooperation with
                                     ∏
   ○ aided from
Conclusions
● Redis fits perfect for simple operations
   ○ SUM + AGGREGATE + MIN + MAX
● In-Memory operations are pretty fast
● Real-time features feel better in a real-time
  database (e.g. time series)
● We don't need batch
What else?
In Redis
● Incremental Collaborative Filtering
● More Recommender
● Live Statistics
At plista
● Semantics with Lucene
● Cloud Technologies
  ○ Scalability
  ○ Enterprise Service Bus
● Contest for Recommenders
Questions?



             www.plista.com

             torben.brodt@plista.com

             @torbenbrodt

             xing.com/profile/Torben_Brodt

             http://goo.gl/pvXm5

             http://lnkd.in/MUXXuv

Mais conteúdo relacionado

Mais de Torben Brodt

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge WorkshopTorben Brodt
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014Torben Brodt
 
Paper the plista dataset
Paper  the plista datasetPaper  the plista dataset
Paper the plista datasetTorben Brodt
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalTorben Brodt
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onTorben Brodt
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Torben Brodt
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contestTorben Brodt
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLTorben Brodt
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenTorben Brodt
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web ToolkitTorben Brodt
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseTorben Brodt
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Torben Brodt
 

Mais de Torben Brodt (14)

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014
 
Paper the plista dataset
Paper  the plista datasetPaper  the plista dataset
Paper the plista dataset
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp Digital
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contest
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQL
 
GIT / SVN
GIT / SVNGIT / SVN
GIT / SVN
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische Empfehlungen
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit Adsense
 
AJAX
AJAXAJAX
AJAX
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"
 

Content recommendations

  • 1. Content Recommendations with Redis Torben Brodt plista GmbH 28. February 2013 Recommender Systems Stammtisch http://recommenders.de
  • 2. Introduction ● plista GmbH ○ recommendations & advertising ○ founded in 2008, Berlin [DE] ○ ~3k recommendations/ second ● never batch = never Hadoop ● stream computing with In Memory Database ● we love
  • 3.
  • 4. How to build recommendations? welt.de/football/berlin_wins.html We only have the URL? to show recommendations we are integrated on the website so "at least" we can count the hits
  • 5. Most popular welt.de/football/berlin_wins.html ● ZINCR "p:welt.de" berlin_wins ● ZREVRANGEBYSCORE p:welt.de berlin_wins 689 +1 summer_is_coming 420 plista_company 135 Live Read + Live Write = Real Time Recommendations
  • 6. Most popular with timeseries welt.de/football/berlin_wins.html ● ZINCR "p:welt.de:1360007000" berlin_wins ● ZUNION ○ "p:welt.de:1360007000" ○ "p:welt.de:1360006000" ○ "p:welt.de:1360005000" ● ZREVRANGEBYSCORE p:welt.de:1360005000 p:welt.de:1360006000 berlin_wins 420 p:welt.de:1360007000 berlin_wins 420 summer_is_coming 135 berlin_wins summer_is_coming 135 689 plista_best_company 689 plista_best_company 689 420 summer_is_coming plista_best_company 135
  • 7. Most popular with timeseries welt.de/football/berlin_wins.html ● ZINCR "p:welt.de:1360007000" berlin_wins ● ZUNION ... WEIGHTS ○ "p:welt.de:1360007000" .. 4 ○ "p:welt.de:1360006000" .. 2 ○ "p:welt.de:1360005000" .. 1 ● ZREVRANGEBYSCORE p:welt.de:1360005000 p:welt.de:1360006000 berlin_wins 420 p:welt.de:1360007000 berlin_wins 420 summer_is_coming 135 berlin_wins summer_is_coming 135 689 plista_best_company 689 plista_best_company 689 420 summer_is_coming plista_best_company 135
  • 8. Most popular with timeseries :1360007000 :1360007000 :1360007000 -1h -2h -3h -4h -5h -6h -7h -8h
  • 9. Most popular to any context ● it's not only publisher, we use ~50 context attributes publisher = welt.de weekday = sunday berlin_wins 689 +1 berlin_wins 400 +1 summer_is_coming 420 dortmund_wins 200 plista_company 135 ... 100 context attributes: ● publisher geolocation = dortmund ● weekday dortmund_wins 200 ● geolocation ● demographics berlin_wins 10 +1 ● ... ... 5
  • 10. Most popular to any context ● how it looks like in Redis ZUNION ... WEIGHTS publisher = welt.de p:welt.de:1360007 4 p:welt.de:1360006 2 weekday = sunday berlin_wins 689 +1 p:welt.de:1360005 1 berlin_wins 400 summer_is_coming 420 w:sunday:1360007 4 dortmund_wins 200 plista_company 135 w:sunday:1360006 2 w:sunday:1360005 1 ... 100 g:dortmund:1360007 4 geolocation = dortmund g:dortmund:1360006 2 g:dortmund:1360005 1 dortmund_wins 200 berlin_wins 10 ... 5
  • 11. Most popular with Effect size ● which context has an influence? ZUNION ... WEIGHTS p:welt.de:1360007 4 * 70% p:welt.de:1360006 2 * 70% p:welt.de:1360005 1 * 70% Examples: w:sunday:1360007 4 * 10% small effect: weather w:sunday:1360006 2 * 10% big effect: publisher w:sunday:1360005 1 * 10% Data with small effect g:dortmund:1360007 4 * 30% should not been taken g:dortmund:1360006 2 * 30% into account, otherwise g:dortmund:1360005 1 * 30% we get avg results Effect Size
  • 12. Most popular with Significance ● some data has more significance/trust ● so we add a significance matrix publisher = welt.de sig:publisher = welt.de berlin_wins 689 berlin_wins 1 summer_is_coming 420 X summer_is_coming 1 plista_company 135 plista_company 0.5 ● Significance might depend on a common limit, like 200 (in the example)
  • 13. Most popular with Significance ● some data has more significance/trust ● so we add a significance matrix SUM over all context Σ( ) publisher = welt.de sig:publisher = welt.de berlin_wins 689 berlin_wins 1 summer_is_coming 420 X summer_is_coming 1 plista_company 135 plista_company 0.5 Numerator SUM over all context sig:publisher = welt.de Denominator Σ berlin_wins 1 summer_is_coming 1 plista_company 0.5
  • 14. SUM over.. ZUNION ... WEIGHTS ● timeseries p:welt.de:1360007 4 ● different context p:welt.de:1360006 2 ● previous hits of the user p:welt.de:1360005 1 ● similar publisher w:sunday:1360007 4 knowledge w:sunday:1360006 2 w:sunday:1360005 1 Σ g:dortmund:1360007 4 publisher = welt.de g:dortmund:1360006 2 berlin_wins 689 g:dortmund:1360005 1 summer_is_coming 420 plista_company 135 ... redis can do it ;)
  • 15. Even more Matrix Operations ;) ● Similarity Matrix ● Human Control Matrix Σ ● Meta-learning Matrix ○ might be covered in next talk ○ cooperation with ∏ ○ aided from
  • 16. Conclusions ● Redis fits perfect for simple operations ○ SUM + AGGREGATE + MIN + MAX ● In-Memory operations are pretty fast ● Real-time features feel better in a real-time database (e.g. time series) ● We don't need batch
  • 17. What else? In Redis ● Incremental Collaborative Filtering ● More Recommender ● Live Statistics At plista ● Semantics with Lucene ● Cloud Technologies ○ Scalability ○ Enterprise Service Bus ● Contest for Recommenders
  • 18. Questions? www.plista.com torben.brodt@plista.com @torbenbrodt xing.com/profile/Torben_Brodt http://goo.gl/pvXm5 http://lnkd.in/MUXXuv