Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

•Download as PPTX, PDF•

4 likes•1,246 views

Recommender systems add value to vast content resources by matching users with items of interest. In recent years, immense progress has been made in recommendation techniques. The evaluation of these has however not been matched and is threatening to impede the further development of recommender systems. In this paper we propose an approach that addresses this impasse by formulating a novel evaluation concept adopting aspects from recommender systems research and industry. Our model can express the quality of a recommender algorithm from three perspectives, the end consumer (user), the service provider and the vendor (business and technique for both). We review current benchmarking activities and point out their shortcomings, which are addressed by our model. We also explain how our 3D benchmarking framework would apply to a speciﬁc use case.

Technology

Recommender systems
evaluation: a 3D benchmark
Alan Said1, Domonkos Tikk2, Yue
Shi3, Martha Larson3, Klára
Stumpf2, Paolo Cremonesi4

1: TU Berlin
2: Gravity R&D
3: TU Delft
4: Politecnico di Milano/Moviri

Motivation
• Current recsys evaluation benchmarks are
insufficient
– mostly focused on IR measures (RMSE,
MAP@X, precision/recall)
– does not consider the need of all stakeholders
(users, content provider, recsys vendor)
– technological and business requirements are
mostly overlooked
• 3D Recommender System Benchmarking
Model

Stakeholders

users

content of service
provider
recommender

Recent benchmarks (1)

• pros:
– Large scale
– very well organized
• cons:
– qualitative assessment of recommendation:
simplified to RMSE
– rating prediction (not ranking)
– no focus on direct business and technical
parameters (scalability, robustness, reactivity)

Recent benchmarks (2)

• pros:
– constraints on training and response time
– real traffic (only planned)
– major driver: revenue increase
• cons:
– only business goals, but otherwise unclear
optimization criteria
– user needs are neglected
– organization

Recent Benchmarks (3)

• pros:
– availability of additional metadata (compared to
KDD Cup 2011)
– not rating based (implicit feedback)
– ranking based evaluation metric (MAP@500)
• cons:
– offline evaluation
– size does not matter anymore (lower interest)
– no business requirements or technical constraint

User requirements
• functional (quality-related)
– relevant, interesting, novel, diverse,
serendipitious, context-aware, ethical, etc.
• non-functional (technology related)
– real-time
– usability-related

Business requirements
• Business model
– for-profit: revenue stream
– NP-style: award driven (reputation,
community building)
• KPI depends on the application area
– Revenue increase
– CTR
– Raise awarness to content or service

Technical constraints
• data driven
– availability of user feedback (e.g. satellite TV)
• system driven
– hardware/software limitations (device-
dependent)
• scalability
– typical response time
• robustness

Example
• VoD recommendation scenario (TV)
– user: easy contect exploration, context-
awareness (time, viewer identification)
– business: increase VoD sales & awareness
(user base)
– technical: middleware, HW/SW of the
provider, response time

Facit
• Recommendation tasks have many aspects
typically overlooked
• Tasks define the important user, business,
and technical quality measures
– the fulfilment of all is required at a certain level
– trade-off is usually required
• Proposal: with our 3D evaluation concept
more comprehensive evaluation can be
achieved

Viewers also liked

Lessons learnt at building recommendation services at industry scaleDomonkos Tikk

Recommenders on video sharing portals - business and algorithmic aspectsDomonkos Tikk

Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk

MovieTweetings: a movie rating dataset collected from twitterSimon Dooms

Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi

Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk

Viewers also liked (6)

Lessons learnt at building recommendation services at industry scale

Recommenders on video sharing portals - business and algorithmic aspects

Neighbor methods vs matrix factorization - case studies of real-life recommen...

MovieTweetings: a movie rating dataset collected from twitter

Context-aware similarities within the factorization framework (CaRR 2013 pres...

Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

Downloads abc 2006 presentation downloads-ramesh_babuHem Rana

10 - Project ManagementRaymond Gao

Best Practices in Recommender System ChallengesAlan Said

Agile DevOps Transformation StrategySatish Nath

Software engineering lecture notesSiva Ayyakutti

Module 6 - Systems Planning bak.pptx.pdfMASantos15

Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Evergreen Systems

01. Developing Business _ IT Solutions 2011.pptiqbal051663

Se lect11 btechIIITA

PMI Presentation2Vincent B. Goldsmith, PMP, CSM

ML Application Life CycleSrujanaMerugu1

Feasibility Study - Management PPT SlidesNusaike Mufthie

Software Project ManagementShauryaGupta38

Requirements Gathering And ManagementAlan McSweeney

Software engineering jwfiles 3Azhar Shaik

Chap01Phanindra Mortha

City universitylondon devprocess_g_a_reitschalanreitsch

ASUG Utilities PresentationMichael Robinson

Dpbok context iMohamed Zakarya Abdelgawad

Soft requirementRishav Upreti

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012 (20)

Downloads abc 2006 presentation downloads-ramesh_babu

10 - Project Management

Best Practices in Recommender System Challenges

Agile DevOps Transformation Strategy

Software engineering lecture notes

Module 6 - Systems Planning bak.pptx.pdf

Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...

01. Developing Business _ IT Solutions 2011.ppt

Se lect11 btech

PMI Presentation2

ML Application Life Cycle

Feasibility Study - Management PPT Slides

Software Project Management

Requirements Gathering And Management

Software engineering jwfiles 3

Chap01

City universitylondon devprocess_g_a_reitsch

ASUG Utilities Presentation

Dpbok context i

Soft requirement

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays

Manulife - Insurer Transformation Award 2024The Digital Insurer

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Real Time Object Detection Using Open CVKhem

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays

MINDCTI Revenue Release Quarter One 2024MIND CTI

A Year of the Servo Reboot: Where Are We Now?Igalia

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

Manulife - Insurer Transformation Award 2024

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Data Cloud, More than a CDP by Matt Robison

AXA XL - Insurer Innovation Award Americas 2024

Artificial Intelligence Chap.5 : Uncertainty

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Real Time Object Detection Using Open CV

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

MINDCTI Revenue Release Quarter One 2024

A Year of the Servo Reboot: Where Are We Now?

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Strategies for Landing an Oracle DBA Job as a Fresher

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

1. Recommender systems evaluation: a 3D benchmark Alan Said1, Domonkos Tikk2, Yue Shi3, Martha Larson3, Klára Stumpf2, Paolo Cremonesi4 1: TU Berlin 2: Gravity R&D 3: TU Delft 4: Politecnico di Milano/Moviri

2. Motivation • Current recsys evaluation benchmarks are insufficient – mostly focused on IR measures (RMSE, MAP@X, precision/recall) – does not consider the need of all stakeholders (users, content provider, recsys vendor) – technological and business requirements are mostly overlooked • 3D Recommender System Benchmarking Model

3. Stakeholders users content of service provider recommender

4. The Proposed 3D model

5. Recent benchmarks (1) • pros: – Large scale – very well organized • cons: – qualitative assessment of recommendation: simplified to RMSE – rating prediction (not ranking) – no focus on direct business and technical parameters (scalability, robustness, reactivity)

6. Recent benchmarks (2) • pros: – constraints on training and response time – real traffic (only planned) – major driver: revenue increase • cons: – only business goals, but otherwise unclear optimization criteria – user needs are neglected – organization

7. Recent Benchmarks (3) • pros: – availability of additional metadata (compared to KDD Cup 2011) – not rating based (implicit feedback) – ranking based evaluation metric (MAP@500) • cons: – offline evaluation – size does not matter anymore (lower interest) – no business requirements or technical constraint

8. 3D MODEL

9. User requirements • functional (quality-related) – relevant, interesting, novel, diverse, serendipitious, context-aware, ethical, etc. • non-functional (technology related) – real-time – usability-related

10. Business requirements • Business model – for-profit: revenue stream – NP-style: award driven (reputation, community building) • KPI depends on the application area – Revenue increase – CTR – Raise awarness to content or service

11. Technical constraints • data driven – availability of user feedback (e.g. satellite TV) • system driven – hardware/software limitations (device- dependent) • scalability – typical response time • robustness

12. Example • VoD recommendation scenario (TV) – user: easy contect exploration, context- awareness (time, viewer identification) – business: increase VoD sales & awareness (user base) – technical: middleware, HW/SW of the provider, response time

13. Facit • Recommendation tasks have many aspects typically overlooked • Tasks define the important user, business, and technical quality measures – the fulfilment of all is required at a certain level – trade-off is usually required • Proposal: with our 3D evaluation concept more comprehensive evaluation can be achieved

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012 (20)

More from Domonkos Tikk

More from Domonkos Tikk (8)

Recently uploaded

Recently uploaded (20)

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012