SlideShare uma empresa Scribd logo
1 de 17
Mining Cross-Domain Rating Datasets
from Structured Data on Twitter
@sidooms
Simon Dooms
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
Rating Datasets
 What are ratings? Explicit user preference information
 Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 3
Ratings Scarcity in Research
 Ratings = private data
 Public datasets to the rescue?
– MovieLens 100K (1998)
– MovieLens 1M (2000)
– MovieLens 10M (2008)
– More on recsyswiki.com
Old, Synthetic Datasets
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
Social Sharing = Ratings Goldmine
 Previous research: MovieTweetings
– Movie Rating dataset from IMDb – Twitter
– https://github.com/sidooms/MovieTweetings
 What about other domains? Websites?
Well, let’s try it out!
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
Target Websites - Goodreads
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7
Twitter user - Rating - Book title
Book author - Goodreads URL - Time
Target Websites - Pandora
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8
Twitter user - Song
Pandora URL - Time
Target Websites - YouTube
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9
Twitter user - (Video uploader)
YouTube URL - Time
Mining Experiment
 But words are wind…
– 2 Weeks experiment
– 4 Online platforms
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12
Python code + Task Scheduler = Dataset files
https://github.com/sidooms/Twitter-ratings
The Numbers
One more thing …
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
Cross-Domain Rating Dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
Applications
 Collect ratings for recsys research / input
 Cross-domain recsys research
 Trend detection, analytics, ...
 Applicable for all social sharing webs
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
Conclusions
 Ratings scarcity in research
 Public dataset are old and synthetic
 Social sharing = ratings goldmine
 2 week experiment, 4 major websites
 Python code & datasets on Github
 True cross-domain ratings dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
@sidooms
Simon Dooms
Mining Cross-Domain Rating Datasets
from Structured Data on Twitter

Mais conteúdo relacionado

Mais de Simon Dooms

Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
Simon Dooms
 

Mais de Simon Dooms (7)

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender SystemsPhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems
 
An online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systemsAn online evaluation of explicit feedback mechanisms for recommender systems
An online evaluation of explicit feedback mechanisms for recommender systems
 
Dynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systemsDynamic generation of personalized hybrid recommender systems
Dynamic generation of personalized hybrid recommender systems
 
Improving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and FiltersImproving IMDb Movie Recommendations with Interactive Settings and Filters
Improving IMDb Movie Recommendations with Interactive Settings and Filters
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
 
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
A User-centric Evaluation of Recommender Algorithms for an Event Recommendati...
 
A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...A File-Based Approach for Recommender Systems in High-Performance Computing E...
A File-Based Approach for Recommender Systems in High-Performance Computing E...
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Último (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

  • 1. Mining Cross-Domain Rating Datasets from Structured Data on Twitter @sidooms Simon Dooms
  • 2. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
  • 3. Rating Datasets  What are ratings? Explicit user preference information  Why ratings? Recommender systems ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 3
  • 4. Ratings Scarcity in Research  Ratings = private data  Public datasets to the rescue? – MovieLens 100K (1998) – MovieLens 1M (2000) – MovieLens 10M (2008) – More on recsyswiki.com Old, Synthetic Datasets ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
  • 5. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
  • 6. Social Sharing = Ratings Goldmine  Previous research: MovieTweetings – Movie Rating dataset from IMDb – Twitter – https://github.com/sidooms/MovieTweetings  What about other domains? Websites? Well, let’s try it out! ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
  • 7. Target Websites - Goodreads ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7 Twitter user - Rating - Book title Book author - Goodreads URL - Time
  • 8. Target Websites - Pandora ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8 Twitter user - Song Pandora URL - Time
  • 9. Target Websites - YouTube ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9 Twitter user - (Video uploader) YouTube URL - Time
  • 10. Mining Experiment  But words are wind… – 2 Weeks experiment – 4 Online platforms ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
  • 11.
  • 12. ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12 Python code + Task Scheduler = Dataset files https://github.com/sidooms/Twitter-ratings
  • 13. The Numbers One more thing … ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
  • 14. Cross-Domain Rating Dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
  • 15. Applications  Collect ratings for recsys research / input  Cross-domain recsys research  Trend detection, analytics, ...  Applicable for all social sharing webs ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
  • 16. Conclusions  Ratings scarcity in research  Public dataset are old and synthetic  Social sharing = ratings goldmine  2 week experiment, 4 major websites  Python code & datasets on Github  True cross-domain ratings dataset ConclusionCross-DomainResultsSocial SharingIntro Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
  • 17. @sidooms Simon Dooms Mining Cross-Domain Rating Datasets from Structured Data on Twitter