SlideShare a Scribd company logo
1 of 13
Recommender systems
evaluation: a 3D benchmark
  Alan Said1, Domonkos Tikk2, Yue
     Shi3, Martha Larson3, Klára
     Stumpf2, Paolo Cremonesi4

1: TU Berlin
2: Gravity R&D
3: TU Delft
4: Politecnico di Milano/Moviri
Motivation
• Current recsys evaluation benchmarks are
  insufficient
  – mostly focused on IR measures (RMSE,
    MAP@X, precision/recall)
  – does not consider the need of all stakeholders
    (users, content provider, recsys vendor)
  – technological and business requirements are
    mostly overlooked
• 3D Recommender System Benchmarking
  Model
Stakeholders




users


                      content of service
                          provider
        recommender
The Proposed 3D model
Recent benchmarks (1)

• pros:
  – Large scale
  – very well organized
• cons:
  – qualitative assessment of recommendation:
    simplified to RMSE
  – rating prediction (not ranking)
  – no focus on direct business and technical
    parameters (scalability, robustness, reactivity)
Recent benchmarks (2)


• pros:
  – constraints on training and response time
  – real traffic (only planned)
  – major driver: revenue increase
• cons:
  – only business goals, but otherwise unclear
    optimization criteria
  – user needs are neglected
  – organization
Recent Benchmarks (3)


• pros:
  – availability of additional metadata (compared to
    KDD Cup 2011)
  – not rating based (implicit feedback)
  – ranking based evaluation metric (MAP@500)
• cons:
  – offline evaluation
  – size does not matter anymore (lower interest)
  – no business requirements or technical constraint
3D MODEL
User requirements
• functional (quality-related)
  – relevant, interesting, novel, diverse,
    serendipitious, context-aware, ethical, etc.
• non-functional (technology related)
  – real-time
  – usability-related
Business requirements
• Business model
  – for-profit: revenue stream
  – NP-style: award driven (reputation,
    community building)
• KPI depends on the application area
  – Revenue increase
  – CTR
  – Raise awarness to content or service
Technical constraints
• data driven
  – availability of user feedback (e.g. satellite TV)
• system driven
  – hardware/software limitations (device-
    dependent)
• scalability
  – typical response time
• robustness
Example
• VoD recommendation scenario (TV)
  – user: easy contect exploration, context-
    awareness (time, viewer identification)
  – business: increase VoD sales & awareness
    (user base)
  – technical: middleware, HW/SW of the
    provider, response time
Facit
• Recommendation tasks have many aspects
  typically overlooked
• Tasks define the important user, business,
  and technical quality measures
  – the fulfilment of all is required at a certain level
  – trade-off is usually required
• Proposal: with our 3D evaluation concept
  more comprehensive evaluation can be
  achieved

More Related Content

Viewers also liked

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsDomonkos Tikk
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterSimon Dooms
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk
 

Viewers also liked (6)

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspects
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitter
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
 

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babuHem Rana
 
10 - Project Management
10 - Project Management10 - Project Management
10 - Project ManagementRaymond Gao
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Agile DevOps Transformation Strategy
Agile DevOps Transformation StrategyAgile DevOps Transformation Strategy
Agile DevOps Transformation StrategySatish Nath
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notesSiva Ayyakutti
 
Module 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfModule 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfMASantos15
 
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Evergreen Systems
 
01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.pptiqbal051663
 
Se lect11 btech
Se lect11 btechSe lect11 btech
Se lect11 btechIIITA
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life CycleSrujanaMerugu1
 
Feasibility Study - Management PPT Slides
Feasibility Study  - Management PPT SlidesFeasibility Study  - Management PPT Slides
Feasibility Study - Management PPT SlidesNusaike Mufthie
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project ManagementShauryaGupta38
 
Requirements Gathering And Management
Requirements Gathering And ManagementRequirements Gathering And Management
Requirements Gathering And ManagementAlan McSweeney
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3Azhar Shaik
 
City universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschCity universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschalanreitsch
 
ASUG Utilities Presentation
ASUG Utilities PresentationASUG Utilities Presentation
ASUG Utilities PresentationMichael Robinson
 

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012 (20)

Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babu
 
10 - Project Management
10 - Project Management10 - Project Management
10 - Project Management
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Agile DevOps Transformation Strategy
Agile DevOps Transformation StrategyAgile DevOps Transformation Strategy
Agile DevOps Transformation Strategy
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
 
Module 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfModule 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdf
 
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
 
01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt
 
Se lect11 btech
Se lect11 btechSe lect11 btech
Se lect11 btech
 
PMI Presentation2
PMI Presentation2PMI Presentation2
PMI Presentation2
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Feasibility Study - Management PPT Slides
Feasibility Study  - Management PPT SlidesFeasibility Study  - Management PPT Slides
Feasibility Study - Management PPT Slides
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
 
Requirements Gathering And Management
Requirements Gathering And ManagementRequirements Gathering And Management
Requirements Gathering And Management
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
 
Chap01
Chap01Chap01
Chap01
 
City universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschCity universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitsch
 
ASUG Utilities Presentation
ASUG Utilities PresentationASUG Utilities Presentation
ASUG Utilities Presentation
 
Dpbok context i
Dpbok   context iDpbok   context i
Dpbok context i
 
Soft requirement
Soft requirementSoft requirement
Soft requirement
 

More from Domonkos Tikk

General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsDomonkos Tikk
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Domonkos Tikk
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwDomonkos Tikk
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online ClassifiedsDomonkos Tikk
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Domonkos Tikk
 
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Domonkos Tikk
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Domonkos Tikk
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...Domonkos Tikk
 

More from Domonkos Tikk (8)

General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendations
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment)
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fw
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online Classifieds
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...
 
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

  • 1. Recommender systems evaluation: a 3D benchmark Alan Said1, Domonkos Tikk2, Yue Shi3, Martha Larson3, Klára Stumpf2, Paolo Cremonesi4 1: TU Berlin 2: Gravity R&D 3: TU Delft 4: Politecnico di Milano/Moviri
  • 2. Motivation • Current recsys evaluation benchmarks are insufficient – mostly focused on IR measures (RMSE, MAP@X, precision/recall) – does not consider the need of all stakeholders (users, content provider, recsys vendor) – technological and business requirements are mostly overlooked • 3D Recommender System Benchmarking Model
  • 3. Stakeholders users content of service provider recommender
  • 5. Recent benchmarks (1) • pros: – Large scale – very well organized • cons: – qualitative assessment of recommendation: simplified to RMSE – rating prediction (not ranking) – no focus on direct business and technical parameters (scalability, robustness, reactivity)
  • 6. Recent benchmarks (2) • pros: – constraints on training and response time – real traffic (only planned) – major driver: revenue increase • cons: – only business goals, but otherwise unclear optimization criteria – user needs are neglected – organization
  • 7. Recent Benchmarks (3) • pros: – availability of additional metadata (compared to KDD Cup 2011) – not rating based (implicit feedback) – ranking based evaluation metric (MAP@500) • cons: – offline evaluation – size does not matter anymore (lower interest) – no business requirements or technical constraint
  • 9. User requirements • functional (quality-related) – relevant, interesting, novel, diverse, serendipitious, context-aware, ethical, etc. • non-functional (technology related) – real-time – usability-related
  • 10. Business requirements • Business model – for-profit: revenue stream – NP-style: award driven (reputation, community building) • KPI depends on the application area – Revenue increase – CTR – Raise awarness to content or service
  • 11. Technical constraints • data driven – availability of user feedback (e.g. satellite TV) • system driven – hardware/software limitations (device- dependent) • scalability – typical response time • robustness
  • 12. Example • VoD recommendation scenario (TV) – user: easy contect exploration, context- awareness (time, viewer identification) – business: increase VoD sales & awareness (user base) – technical: middleware, HW/SW of the provider, response time
  • 13. Facit • Recommendation tasks have many aspects typically overlooked • Tasks define the important user, business, and technical quality measures – the fulfilment of all is required at a certain level – trade-off is usually required • Proposal: with our 3D evaluation concept more comprehensive evaluation can be achieved