SlideShare uma empresa Scribd logo
1 de 36
The MovieLens Datasets:
History and Context
Max Harper (presenter)
Joe Konstan
2
http://tiis.acm.org/iui16/
MovieLens: 5 star movie ratings
userId,movieId,rating,timestamp
1,2,3.5,1112486027
1,29,3.5,1112484676
1,32,3.5,1112484819
1,47,3.5,1112484727
1,50,3.5,1112484580
1,112,3.5,1094785740
1,151,4.0,1094785734
1,223,4.0,1112485573
1,253,4.0,1112484940
...
138493,69644,3.0,1260209457
138493,70286,5.0,1258126944
138493,71619,2.5,1255811136
3
web site: dataset:
ratings data is interesting, intuitive,
and pervasive
4
dataset impact
» 140,000 downloads in 2014
» a search for “movielens” yields
• 6,020 results in Google Books
• 8,920 results in Google Scholar
5
dataset uses
» research
» technical: programming books + blogs
» educational (including a MOOC)
» industrial R&D, demos
6
overview
» MovieLens datasets overview
» dataset stability, system change
7
8
<user, movie, rating, timestamp>
9
<user, movie, rating, timestamp>
<Max, Toy Story, 4.0, 2010-12-01 12:00:00>
MovieLens benchmark datasets
10
Name Dates Users Movies Ratings Density
ML 100K ‘97 – ‘98 943 1,682 100,000 6.30%
ML 1M ‘00 – ‘03 6,040 3,706 1,000,209 4.47%
ML 10M ‘95 – ‘09 69,878 10,681 10,000,054 1.34%
ML 20M ‘95 – ‘15 138,493 27,278 20,000,263 0.54%
designed for replicability
MovieLens latest datasets
11
Name Dates Users Movies Ratings Density
ML Latest ‘95 – ‘16 247,753 34,208 22,884,377 0.003%
ML Latest
Small
‘96 – ‘16 668 10,329 105,339 0.015%
designed for recency
overview
» MovieLens datasets overview
» dataset stability, system change
12
tension: datasets vs. system
» ideal (pure) vs. actual (it’s complex)
» systems want to change
• stay current, constant improvements
• A/B tests, beta testing, and other experiments
» context changes
• devices, competing sites, changing user base
13
14
15
16
17
18
some key changes
» core flow of browse/search
» rating widget
» recommender
» new user experience
» …
19
history of experiments
» both online field experiments and online
lab experiments
» created temporary and permanent
changes, changed pattern of use
20
21
in the paper
» the story of MovieLens (1997 origins)
• lessons learned from running a “real” system
in a research lab
• lots of fun descriptive stats/charts
» best practices for dataset researchers
• limitations
• alternatives
22
people who made this possible
» John Riedl
» Istvan Albert, Al Borchers, Dan Cosley,
Brent J. Dahlen, Rich Davies, Michael
Ekstrand, Dan Frankowski, Nathaniel
Good, Jon Herlocker, Daniel Kluver,
Shyong (Tony) Lam, Michael Ludwig,
Sean McNee, Chad Salvatore, Shilad Sen,
and Loren Terveen
» MovieLens users
23
in ACM Transactions on Interactive Intelligent Systems, Dec. 2015
» feedback? contact us: grouplens-info@cs.umn.edu
presented by Max Harper, Research Scientist, University of Minnesota,
harper@cs.umn.edu
written with Joe Konstan, Distinguished McKnight University Professor,
University of Minnesota, konstan@cs.umn.edu
This material is based on work supported by the National Science Foundation under grants
DGE-9554517, IIS-9613960, IIS-9734442, IIS-9978717, EIA-9986042, IIS-0102229, IIS-
0324851, IIS-0534420, IIS-0808692, IIS-0964695, IIS-0968483, IIS-1017697, IIS-1210863.
This project was also supported by the University of Minnesota’s Undergraduate Research
Opportunities Program and by grants and/or gifts from Net Perceptions, Inc., CFK Productions,
and Google.
24
The MovieLens Datasets:
History and Context
25
26
version 0 (1997) version 4 (2014)
one solution
» document change, include with datasets
27
key dataset limitations (1/2)
» system UI and recommender changes
» bias towards “successful” users
» possible bias towards users with tolerance
for “research quality” design
» timestamps do not reflect time of
consumption
28
key dataset limitations (2/2)
» recommender systems research
community attitudes
• implicit behaviors > ratings?
• dataset-only research increasingly
discouraged
29
30
MovieLens system evolution
key changes and experiments
31
alternative datasets
32
Name Domain Rating Scale Ratings Density
Book-
Crossing
books 0 - 10 1.1m 0.003%
EachMovie movies 0 - 14 2.7m 2.872%
Jester
(dataset1)
jokes -10 - 10 4.1m 57.463%
Amazon many 1 - 5 82.8m < 0.001%
Netflix Prize movies 1 - 5 100.5m 1.178%
Yahoo Music
(C15)
music (various) 0 - 100 262.8m 0.042%
33
EachMovie
lessons from running MovieLens
» lessons from startups apply (it’s hard, fail
fast)
» continual work, not one-time effort
» encourage code quality through good
social coding conventions
» invest in tools that allow users to help
34
dataset uses
» recommender systems research
» recommender systems MOOC
• http://coursera.org/learn/recommender-systems
» code examples (popular press, blogs)
» higher education
» commercial – internal testing
35
36

Mais conteúdo relacionado

Mais procurados

Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Multiple object detection
Multiple object detectionMultiple object detection
Multiple object detectionSAURABH KUMAR
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsVaibhav Singh
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...Edge AI and Vision Alliance
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL
 
Restaurant recommender system
Restaurant recommender systemRestaurant recommender system
Restaurant recommender systemArif Huda
 
Final year project presentation
Final year project presentationFinal year project presentation
Final year project presentationSulemanAliMalik
 
Snapchat Bug Reporting
Snapchat Bug ReportingSnapchat Bug Reporting
Snapchat Bug Reportingtahreemsaleem
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Game Analytics & Machine Learning
Game Analytics & Machine LearningGame Analytics & Machine Learning
Game Analytics & Machine LearningBen Weber
 
Using xAPI in Unity Games.pptx
Using xAPI in Unity Games.pptxUsing xAPI in Unity Games.pptx
Using xAPI in Unity Games.pptxArt Werkenthin
 

Mais procurados (20)

Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Project presentation
Project presentationProject presentation
Project presentation
 
Image stitching
Image stitchingImage stitching
Image stitching
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Multiple object detection
Multiple object detectionMultiple object detection
Multiple object detection
 
Game balancing
Game balancingGame balancing
Game balancing
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation Systems
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
 
Restaurant recommender system
Restaurant recommender systemRestaurant recommender system
Restaurant recommender system
 
Final year project presentation
Final year project presentationFinal year project presentation
Final year project presentation
 
Snapchat Bug Reporting
Snapchat Bug ReportingSnapchat Bug Reporting
Snapchat Bug Reporting
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Deep learning-practical
Deep learning-practicalDeep learning-practical
Deep learning-practical
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Game Analytics & Machine Learning
Game Analytics & Machine LearningGame Analytics & Machine Learning
Game Analytics & Machine Learning
 
Using xAPI in Unity Games.pptx
Using xAPI in Unity Games.pptxUsing xAPI in Unity Games.pptx
Using xAPI in Unity Games.pptx
 

Destaque

MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterSimon Dooms
 
RecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionRecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionSimon Dooms
 
Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Roberto Turrin
 
Trust and Recommender Systems
Trust and  Recommender SystemsTrust and  Recommender Systems
Trust and Recommender Systemszhayefei
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 

Destaque (7)

MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitter
 
RecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop IntroductionRecSys Challenge 2014 Workshop Introduction
RecSys Challenge 2014 Workshop Introduction
 
Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014
 
B7 ppt
B7 pptB7 ppt
B7 ppt
 
Trust and Recommender Systems
Trust and  Recommender SystemsTrust and  Recommender Systems
Trust and Recommender Systems
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 

Semelhante a The MovieLens Datasets: History and Context

Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Data visualisations: drawing actionable insights from science and technology ...
Data visualisations: drawing actionable insights from science and technology ...Data visualisations: drawing actionable insights from science and technology ...
Data visualisations: drawing actionable insights from science and technology ...EFSA EU
 
Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...James Gleeson
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Eleftherios Spyromitros-Xioufis
 
Enhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft PowerEnhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft PowerAmit Sheth
 
CHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseCHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseRamine Tinati
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis? Amit Sheth
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of DataDavid De Roure
 
6. Work6 Social Distancing.pptx
6. Work6 Social Distancing.pptx6. Work6 Social Distancing.pptx
6. Work6 Social Distancing.pptxVanditha11
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...Brian Bot
 
My Dissertation Defense
My Dissertation Defense My Dissertation Defense
My Dissertation Defense Laura Pasquini
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analyticssuresh sood
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Jeffrey Nichols
 
Learning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksLearning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksSymeon Papadopoulos
 

Semelhante a The MovieLens Datasets: History and Context (20)

Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Data visualisations: drawing actionable insights from science and technology ...
Data visualisations: drawing actionable insights from science and technology ...Data visualisations: drawing actionable insights from science and technology ...
Data visualisations: drawing actionable insights from science and technology ...
 
Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Enhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft PowerEnhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft Power
 
CHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseCHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || Zooniverse
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
 
6. Work6 Social Distancing.pptx
6. Work6 Social Distancing.pptx6. Work6 Social Distancing.pptx
6. Work6 Social Distancing.pptx
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...
 
My Dissertation Defense
My Dissertation Defense My Dissertation Defense
My Dissertation Defense
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
 
Learning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksLearning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction Networks
 

Último

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 

Último (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 

The MovieLens Datasets: History and Context

  • 1. The MovieLens Datasets: History and Context Max Harper (presenter) Joe Konstan
  • 3. MovieLens: 5 star movie ratings userId,movieId,rating,timestamp 1,2,3.5,1112486027 1,29,3.5,1112484676 1,32,3.5,1112484819 1,47,3.5,1112484727 1,50,3.5,1112484580 1,112,3.5,1094785740 1,151,4.0,1094785734 1,223,4.0,1112485573 1,253,4.0,1112484940 ... 138493,69644,3.0,1260209457 138493,70286,5.0,1258126944 138493,71619,2.5,1255811136 3 web site: dataset:
  • 4. ratings data is interesting, intuitive, and pervasive 4
  • 5. dataset impact » 140,000 downloads in 2014 » a search for “movielens” yields • 6,020 results in Google Books • 8,920 results in Google Scholar 5
  • 6. dataset uses » research » technical: programming books + blogs » educational (including a MOOC) » industrial R&D, demos 6
  • 7. overview » MovieLens datasets overview » dataset stability, system change 7
  • 9. 9 <user, movie, rating, timestamp> <Max, Toy Story, 4.0, 2010-12-01 12:00:00>
  • 10. MovieLens benchmark datasets 10 Name Dates Users Movies Ratings Density ML 100K ‘97 – ‘98 943 1,682 100,000 6.30% ML 1M ‘00 – ‘03 6,040 3,706 1,000,209 4.47% ML 10M ‘95 – ‘09 69,878 10,681 10,000,054 1.34% ML 20M ‘95 – ‘15 138,493 27,278 20,000,263 0.54% designed for replicability
  • 11. MovieLens latest datasets 11 Name Dates Users Movies Ratings Density ML Latest ‘95 – ‘16 247,753 34,208 22,884,377 0.003% ML Latest Small ‘96 – ‘16 668 10,329 105,339 0.015% designed for recency
  • 12. overview » MovieLens datasets overview » dataset stability, system change 12
  • 13. tension: datasets vs. system » ideal (pure) vs. actual (it’s complex) » systems want to change • stay current, constant improvements • A/B tests, beta testing, and other experiments » context changes • devices, competing sites, changing user base 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. some key changes » core flow of browse/search » rating widget » recommender » new user experience » … 19
  • 20. history of experiments » both online field experiments and online lab experiments » created temporary and permanent changes, changed pattern of use 20
  • 21. 21
  • 22. in the paper » the story of MovieLens (1997 origins) • lessons learned from running a “real” system in a research lab • lots of fun descriptive stats/charts » best practices for dataset researchers • limitations • alternatives 22
  • 23. people who made this possible » John Riedl » Istvan Albert, Al Borchers, Dan Cosley, Brent J. Dahlen, Rich Davies, Michael Ekstrand, Dan Frankowski, Nathaniel Good, Jon Herlocker, Daniel Kluver, Shyong (Tony) Lam, Michael Ludwig, Sean McNee, Chad Salvatore, Shilad Sen, and Loren Terveen » MovieLens users 23
  • 24. in ACM Transactions on Interactive Intelligent Systems, Dec. 2015 » feedback? contact us: grouplens-info@cs.umn.edu presented by Max Harper, Research Scientist, University of Minnesota, harper@cs.umn.edu written with Joe Konstan, Distinguished McKnight University Professor, University of Minnesota, konstan@cs.umn.edu This material is based on work supported by the National Science Foundation under grants DGE-9554517, IIS-9613960, IIS-9734442, IIS-9978717, EIA-9986042, IIS-0102229, IIS- 0324851, IIS-0534420, IIS-0808692, IIS-0964695, IIS-0968483, IIS-1017697, IIS-1210863. This project was also supported by the University of Minnesota’s Undergraduate Research Opportunities Program and by grants and/or gifts from Net Perceptions, Inc., CFK Productions, and Google. 24 The MovieLens Datasets: History and Context
  • 25. 25
  • 26. 26 version 0 (1997) version 4 (2014)
  • 27. one solution » document change, include with datasets 27
  • 28. key dataset limitations (1/2) » system UI and recommender changes » bias towards “successful” users » possible bias towards users with tolerance for “research quality” design » timestamps do not reflect time of consumption 28
  • 29. key dataset limitations (2/2) » recommender systems research community attitudes • implicit behaviors > ratings? • dataset-only research increasingly discouraged 29
  • 30. 30
  • 31. MovieLens system evolution key changes and experiments 31
  • 32. alternative datasets 32 Name Domain Rating Scale Ratings Density Book- Crossing books 0 - 10 1.1m 0.003% EachMovie movies 0 - 14 2.7m 2.872% Jester (dataset1) jokes -10 - 10 4.1m 57.463% Amazon many 1 - 5 82.8m < 0.001% Netflix Prize movies 1 - 5 100.5m 1.178% Yahoo Music (C15) music (various) 0 - 100 262.8m 0.042%
  • 34. lessons from running MovieLens » lessons from startups apply (it’s hard, fail fast) » continual work, not one-time effort » encourage code quality through good social coding conventions » invest in tools that allow users to help 34
  • 35. dataset uses » recommender systems research » recommender systems MOOC • http://coursera.org/learn/recommender-systems » code examples (popular press, blogs) » higher education » commercial – internal testing 35
  • 36. 36

Notas do Editor

  1. I am the current caretaker of a system called movielens, and the datasets that are derived from that system I'm here to present a paper that we published in Transactions on Interactive Intelligent Systems about movielens and the movielens datasets Notes: what is the point? why should I listen to this talk? why are you telling us this? theme: tension building/maintaining a real system vs. producing a “pure” dataset - a solution (impossible to implement retroactively) is to document extensively (e.g., add version number to each rating) there are many other things that changed beyond the ones listed in the current talk…mention them briefly? add a road-map at the beginning of the talk - maybe “things to know if you use the movielens datasets” include most cited papers (+1) don’t say specifics about recommenders – just say how high level effect might have influenced ratings why are you telling us movielens history? we’re sharing these lessons because we think they’re useful for users and people who want to generate their own datasets say as a theme: the system changes and that has impact on dataset? mention genome and other grouplens datasets?
  2. MovieLens is a web site that collects 5-star ratings on movies We have collected the result of many users providing many of these movie ratings in the movielens datasets, a publicly available resource for folks to explore rating data Notes: possibly convert to a data table
  3. Fundamentally, movielens is relevant because ratings-based systems have become so prevalent across a variety of systems (maybe cut this slide?)
  4. most of these books and papers refer to the datasets, rather than the system Notes: just say “mooc”
  5. most of these books and papers refer to the datasets, rather than the system Notes: just say “mooc”
  6. 2 goals in this talk. introduce the MovieLens datasets to make sure everyone knows what I’m talking about, and to catch some of you up on new releases discuss the tension between system-building and dataset purity, which I hope will be useful both to inform us about some potential limitations inherent in dataset-based research and to inform researchers engaged in releasing datasets of their own --- relevance to IUI folks who… conduct dataset research peer review dataset research build systems release datasets
  7. fundamentally, the MovieLens datasets describe users’ movie rating behavior the core of the dataset contains tuples of the form shown here.
  8. for example: user Max rated the movie Toy Story 4 stars at a particular time rating values represent “half-star” ratings, from 0.5 stars to 5 stars timestamps represent the most recent time when the rating was provided In our latest dataset, there are about 20 million records like these
  9. here are the four dataset versions we’ve released one about every five years they vary quite a bit in their characteristics the older datasets are most useful for comparing new work to existing published studies we recommend that new work that is not comparative uses the 20m dataset
  10. for development or educational work, we have released a set of non-stable “latest” datasets kept up to date (generated in 2016 to include new movies) latest is unabridged, containing all users, including those with just 1 rating latest-small is kept to 100k ratings for speed of development and testing, designed for educational purposes, demos, and other needs that don’t require big data latest-small is also redistributable for non-commercial purposes
  11. ideal: “pure” datasets actual: user-generated datasets come from user interaction with a system these changes work against the concept of generating pure data movielens is a good case study, since it has been around for so long
  12. Here it is! This is movielens, circa August 1997, around the time of its launch, as rendered by netscape navigator 4 MovieLens has operated continuously since that time. Let’s look though some screenshots showing its evolution
  13. version 1, released september 1999
  14. version 2, released February 2000
  15. version 3, released February 2003
  16. and most recently, version 4, released November 2014 and this basically what it will look like if you visit today
  17. core flow of browse/search rating widget half stars, number of clicks recommender prediction, ordering new user experience “entry barrier”, initial personalization there’s more: tagging, movie management, social features, … recommender (1997 user-user via grouplens, 1999 user-user net perceptions, 2003 item-item multilens, 2012 item-item lenskit, 2014 popularity blending item-item or svd) new user (1997 rate 5 from 10 at a time (9 random, 1 easy), 2002 rate 15 selected for popularity, 2014 pick groups recommender) ratings widget (1997 5 stars dropdown, 2003 half stars pulldown, 2014 clickable stars) Notes: more visuals too much here
  18. …not unique to MovieLens, practice of A/B testing affects most datasets (e.g., Netflix, Amazon)
  19. and yet we find remarkable stability in general use of the ratings widget in aggregation chart shows average and median ratings across time, aggregated by month. given the extent of changes we’ve just discussed, it is somewhat remarkable to observe so little monthly variation Notes: get rid of median line?
  20. a brief acknowledgement of the people who made this retrospective look possible
  21. the core idea or premise hasn’t really changed since its initial release! movielens is a system that helps people find movies to watch it works by asking users to rate movies to express their preferences in 1-5 stars it uses those ratings to predict subsequent ratings it can prioritize the display of highly-predicted ratings to personalize the experience
  22. Notes: polish presentation of timestamps + influence
  23. usage movielens has been used by lots of people, all around the world we’ve registered about 280,000 people since launching in 1997 and the system has welcomed several thousand monthly active users since 2001 Notes: maybe combine with other chart?
  24. To understand the datasets, it is critical to understand the underlying system Like all systems, movielens has changed Like many systems, movielens has experimented with features
  25. there are a variety of other datasets that provide different characteristics this table shows some of the most prominent ones the two biggest alternatives in the movies space, eachmovie and netflix, have each been redacted and are no longer available, legally speaking however, there are a number of great alternatives for ratings data across other domains Notes: Maybe cut this slide explain the cross-outs
  26. Let’s go back to the mid-90’s Digital Equipment Corporation (DEC) was running an experimental system called EachMovie EachMovie was built to explore the still young idea of personalized recommendations with collaborative filtering But in 1997, DEC decided to shut down EachMovie The DEC researchers reached out to the recommender systems community, looking for an organization to develop a replacement site, to serve the same users Joe Konstan and John Riedl (pictured here) responded, and had their graduate students build a “copy” of eachMovie, backed by the grouplens recommender engine
  27. our paper has links to all of those, if you’re interested!