Recommender systems add value to vast content resources by matching users with items of interest. In recent years, immense progress has been made in recommendation techniques. The evaluation of these has however not been matched and is threatening to impede the further development of recommender systems. In this paper we propose an approach that addresses this impasse by formulating a novel evaluation concept adopting aspects from recommender systems research and industry. Our model can express the quality of a recommender algorithm from three perspectives, the end consumer (user), the service provider and the vendor (business and technique for both). We review current benchmarking activities and point out their shortcomings, which are addressed by our model. We also explain how our 3D benchmarking framework would apply to a specific use case.
Strategies for Landing an Oracle DBA Job as a Fresher
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012
1. Recommender systems
evaluation: a 3D benchmark
Alan Said1, Domonkos Tikk2, Yue
Shi3, Martha Larson3, Klára
Stumpf2, Paolo Cremonesi4
1: TU Berlin
2: Gravity R&D
3: TU Delft
4: Politecnico di Milano/Moviri
2. Motivation
• Current recsys evaluation benchmarks are
insufficient
– mostly focused on IR measures (RMSE,
MAP@X, precision/recall)
– does not consider the need of all stakeholders
(users, content provider, recsys vendor)
– technological and business requirements are
mostly overlooked
• 3D Recommender System Benchmarking
Model
5. Recent benchmarks (1)
• pros:
– Large scale
– very well organized
• cons:
– qualitative assessment of recommendation:
simplified to RMSE
– rating prediction (not ranking)
– no focus on direct business and technical
parameters (scalability, robustness, reactivity)
6. Recent benchmarks (2)
• pros:
– constraints on training and response time
– real traffic (only planned)
– major driver: revenue increase
• cons:
– only business goals, but otherwise unclear
optimization criteria
– user needs are neglected
– organization
7. Recent Benchmarks (3)
• pros:
– availability of additional metadata (compared to
KDD Cup 2011)
– not rating based (implicit feedback)
– ranking based evaluation metric (MAP@500)
• cons:
– offline evaluation
– size does not matter anymore (lower interest)
– no business requirements or technical constraint
10. Business requirements
• Business model
– for-profit: revenue stream
– NP-style: award driven (reputation,
community building)
• KPI depends on the application area
– Revenue increase
– CTR
– Raise awarness to content or service
11. Technical constraints
• data driven
– availability of user feedback (e.g. satellite TV)
• system driven
– hardware/software limitations (device-
dependent)
• scalability
– typical response time
• robustness
12. Example
• VoD recommendation scenario (TV)
– user: easy contect exploration, context-
awareness (time, viewer identification)
– business: increase VoD sales & awareness
(user base)
– technical: middleware, HW/SW of the
provider, response time
13. Facit
• Recommendation tasks have many aspects
typically overlooked
• Tasks define the important user, business,
and technical quality measures
– the fulfilment of all is required at a certain level
– trade-off is usually required
• Proposal: with our 3D evaluation concept
more comprehensive evaluation can be
achieved