SlideShare uma empresa Scribd logo
1 de 13
Setting Goals and Choosing Metrics for Recommender
                   System Evaluations
                     Gunnar Schröder, Maik Thiele, Wolfgang Lehner

Gunnar Schröder                                                    UCERSTI 2 Workshop
T-Systems Multimedia Solutions                            at the 5th ACM Conference on
Dresden University of Technology                                 Recommender Systems
                                                           Chicago, October 23th, 2011
How Do You Evaluate Recommender Systems?


                                               RMSE
                           Precision
                                                                       F1-Measure
              Recall                     MAE
                                                          ROC Curves
   Qualitative Techniques
                                                                               Quantitative Techniques
                              User-Centric Evaluation

        Mean Average Precision                                    Area under the Curve


                       Accuracy Metrics                    Non-Accuracy Metrics



                 But why do you do it exactly this way?


              Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Some of the Issues This Paper Tries to Touch


   A large variety of metrics have been published
   Some metrics are highly correlated [Herlocker 2004]
   Little guidance for evaluating recommenders and choosing metrics

   Which aspects of the usage scenario and the data influence the choice?
   Which metrics are applicable?
   What do these metrics express?
   What are differences among them?
   Which metric represents our use-case best?
   How much do the metrics suffer from biases?




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Factors That Influence the Choice of Evaluation Metrics


                                   Objectives for recommender usage

              Business goals                                                       User interests


                                   Recommender task and interaction

 Prediction       Classification                    Ranking                     Similarity                Presentation


                                                Preference data

   Explicit            Implicit                       Unary                        Binary                  Numerical


                                               Choice of metrics



                 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Major Classes of Evaluation Metrics


     Prediction Accuracy Metrics
     Ranking Accuracy Metrics
     Classification Accuracy Metrics
     Non-Accuracy Metrics




    5.0     4.8         4.7          4.3             3.8            3.2           2.4            2.1         1.6   1.2




                    Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Why Precision, Recall and F1-Measure May Fool You


   Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )
   Four recommendations (R1 – R4) e.g. Precision@4
   Ten items with a varying ratio of relevant items (1 – 9 relevant items)




   Precision, recall and F1-measure are very sensitive to the ratio of relevant items Figure 3
   They fail to distinguish between an ideal recommender and a worst-case recommender if
    the ratio of relevant items is varied



                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)




                                                                                                           Figure 1




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

                                                                                                                 2.
                                                                                                                      1.
                                                                                                                 2.

                                                                                                                 2.
                                                                                                            3.

                                                                                                           part of
                                                                                                           Figure 1




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

                                                                                              2.                        3.
                                                                                                1.             2.            1.
                                                                                            3.                     1.    2.
                                                                                                              3.        3.

                                                                                                                        3.


                                                                                                                        part of
   Markedness = Precision + InvPrecision – 1                                                                           Figure 1
   Informedness = Recall + InvRecall – 1
   Matthew’s Correlation =
                                                                                         [Powers 2007]


                     Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
From Simple Classification Measures to Partial Ranking Measures


   Moving a single relevant item among the recommenders ranking (examples a - j)




   Idea: Consider both classification and ranking for the top-k recommendations                           Figure 2


   Area under the Curve => Limited Area under the Curve

   Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
A Further More Complex Example to Study at Home




                                                                                                           Figure 4
   Conclusions:
      For classification use markedness, informedness and Matthew’s correlation instead
       of precision, recall and F1 measure
      Limited area under the curve and limited boolean Kendall’s tau are useful metrics for
       top-k recommender evaluations

                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Conclusion and Contributions


   Important aspects that influence the metric choice
      Objectives for recommender usage
      Recommender task and interaction
      Aspects of preference data


   Some problems of Precision, Recall and F1-Measure
   The advantages of markedness, informedness and Matthew’s correlation

   Two new metrics that measure the ranking of a limited top-k list
      Limited area under the curve, limited boolean Kendall’s tau


   Guidelines for choosing a metric (See paper)


                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Thank You Very Much!


   Do not hesitate to contact me, if you have any
    questions, comments or answers!




   Slides are available via e-mail or slideshare




                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

Mais conteúdo relacionado

Mais procurados (6)

Tutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.xTutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.x
 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
 
SuperDecision for AHP and ANP
SuperDecision for AHP and ANPSuperDecision for AHP and ANP
SuperDecision for AHP and ANP
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Microsoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMicrosoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data Science
 
Trust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor DissertationTrust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor Dissertation
 

Destaque

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Pablo Castells
 

Destaque (6)

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 

Semelhante a Setting Goals and Choosing Metrics for Recommender System Evaluations

Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
daniahendric
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
cscpconf
 
PASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docxPASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docx
herbertwilson5999
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
Marciano Moreno
 
operation research notes
operation research notesoperation research notes
operation research notes
Renu Thakur
 

Semelhante a Setting Goals and Choosing Metrics for Recommender System Evaluations (20)

Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
IM426 3A G5.ppt
IM426 3A G5.pptIM426 3A G5.ppt
IM426 3A G5.ppt
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
 
Research Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBAResearch Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBA
 
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
 
A Study of Significant Software Metrics
A Study of Significant Software MetricsA Study of Significant Software Metrics
A Study of Significant Software Metrics
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
 
PASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docxPASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docx
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
 
IRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image Processing
 
New seven management tools
New seven management toolsNew seven management tools
New seven management tools
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
 
operation research notes
operation research notesoperation research notes
operation research notes
 
Management quality tools (6) 19 1-2016
Management quality tools (6)  19 1-2016Management quality tools (6)  19 1-2016
Management quality tools (6) 19 1-2016
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

Setting Goals and Choosing Metrics for Recommender System Evaluations

  • 1. Setting Goals and Choosing Metrics for Recommender System Evaluations Gunnar Schröder, Maik Thiele, Wolfgang Lehner Gunnar Schröder UCERSTI 2 Workshop T-Systems Multimedia Solutions at the 5th ACM Conference on Dresden University of Technology Recommender Systems Chicago, October 23th, 2011
  • 2. How Do You Evaluate Recommender Systems? RMSE Precision F1-Measure Recall MAE ROC Curves Qualitative Techniques Quantitative Techniques User-Centric Evaluation Mean Average Precision Area under the Curve Accuracy Metrics Non-Accuracy Metrics But why do you do it exactly this way? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 3. Some of the Issues This Paper Tries to Touch  A large variety of metrics have been published  Some metrics are highly correlated [Herlocker 2004]  Little guidance for evaluating recommenders and choosing metrics  Which aspects of the usage scenario and the data influence the choice?  Which metrics are applicable?  What do these metrics express?  What are differences among them?  Which metric represents our use-case best?  How much do the metrics suffer from biases? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 4. Factors That Influence the Choice of Evaluation Metrics Objectives for recommender usage Business goals User interests Recommender task and interaction Prediction Classification Ranking Similarity Presentation Preference data Explicit Implicit Unary Binary Numerical Choice of metrics Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 5. Major Classes of Evaluation Metrics  Prediction Accuracy Metrics  Ranking Accuracy Metrics  Classification Accuracy Metrics  Non-Accuracy Metrics 5.0 4.8 4.7 4.3 3.8 3.2 2.4 2.1 1.6 1.2 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 6. Why Precision, Recall and F1-Measure May Fool You  Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )  Four recommendations (R1 – R4) e.g. Precision@4  Ten items with a varying ratio of relevant items (1 – 9 relevant items)  Precision, recall and F1-measure are very sensitive to the ratio of relevant items Figure 3  They fail to distinguish between an ideal recommender and a worst-case recommender if the ratio of relevant items is varied Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 7. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 8. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 1. 2. 2. 3. part of Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 9. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 3. 1. 2. 1. 3. 1. 2. 3. 3. 3. part of  Markedness = Precision + InvPrecision – 1 Figure 1  Informedness = Recall + InvRecall – 1  Matthew’s Correlation = [Powers 2007] Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 10. From Simple Classification Measures to Partial Ranking Measures  Moving a single relevant item among the recommenders ranking (examples a - j)  Idea: Consider both classification and ranking for the top-k recommendations Figure 2  Area under the Curve => Limited Area under the Curve  Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 11. A Further More Complex Example to Study at Home Figure 4  Conclusions:  For classification use markedness, informedness and Matthew’s correlation instead of precision, recall and F1 measure  Limited area under the curve and limited boolean Kendall’s tau are useful metrics for top-k recommender evaluations Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 12. Conclusion and Contributions  Important aspects that influence the metric choice  Objectives for recommender usage  Recommender task and interaction  Aspects of preference data  Some problems of Precision, Recall and F1-Measure  The advantages of markedness, informedness and Matthew’s correlation  Two new metrics that measure the ranking of a limited top-k list  Limited area under the curve, limited boolean Kendall’s tau  Guidelines for choosing a metric (See paper) Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 13. Thank You Very Much!  Do not hesitate to contact me, if you have any questions, comments or answers!  Slides are available via e-mail or slideshare Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder