SlideShare a Scribd company logo
1 of 15
Download to read offline
A Flexible Recommendation
System for Cable TV
Diogo Gonçalves, Miguel Costa, Francisco Couto
LaSIGE @ Faculty of Sciences, University of Lisbon
RecSysTV 2016, Boston, USA
September 15, 2016
• Information overflow
• too many video contents from which to choose
• too much time exploring video contents
(thousands of programs broadcast in hundreds of TV channels, plus thousands of movies & series on VOD)
• Impact
• dissatisfaction
• change to other systems with recommendations (e.g. Netflix, Youtube)
• less visualization time
• less revenue
• churn
Source: The Netflix Recommender System: Algorithms, Business Value, and Innovation, 2016
2
• Characterization of user behaviour in cable TV (previous study).
• showed that Live TV and Catch-up TV have more users and views
• Improvement of state-of-the-art algorithms using the learning-to-rank (L2R)
framework with contextual information and implicit feedback.
1. Extraction of implicit feedback for each pair <user, program>
2. Feature engineering on the contextual information
3. Creation of a large-scale dataset for learning & evaluation
4. Creation of a L2R model (cast recommendations as a supervised learning problem)
5. Evaluation against state-of-the-art algorithms
3
goal
• 12 weeks, between October and December of 2015
• 10K clients randomly chosen that watched at least 10 programs per week
• 21K programs available to the users during that period
• 4.5M of user visualizations (users saw more than 50%)
• we assumed users liked these ones
• 78.5M of user visualizations inferior to 50% (most are 0%)
• In average, each user watched 454 programs and each program was watched by 216 users
4
implicit
feedback
(ground truth)
• Users tend to see the same programs & channels
• Rankings of programs and channels & the relative visualization in each one (per user and for all users)
• Users tend to see the same categories & subcategories of contents
• Rankings of categories and subcategories & the relative visualization in each one (per user and for all users)
• Time influences user preferences
• Weekend information, broadcast period, time passed since broadcast, last days information
• Users tend to see programs with similar textual description of their content
• Textual similarity functions (e.g. Jaccard, TFxIDF) between metadata (e.g. title, description)
• Users tend to see programs with similar characteristics
• Number of episodes, content age & duration, category & subcategory
• Different types of contents require different strategies
• Information about who saw the content and whether is new for the user or for all
• Similar users watch the same programs
• Colaborative-filtering algorithms, such as WRMF (Collaborative Filtering for Implicit Feedback Datasets, ICDM 2008)
(60 features created based on the insights of previous characterization)
5
Users …… ……
)1(
q
)(
1
i
d
)2(
q )(i
q )(m
q
)(
2
i
d
)(i
jd )(
)(
i
n id
…… ……
Programs
)(
1
i
y )(
2
i
y )(i
jy )(
)(
i
n iyPreference scores
)(
1
i
x )(
2
i
x
)(i
jx )(
)(
i
n ixFeature vectors
)(
1
i
z )(
2
i
z )(i
jz
)(
)(
i
n izModel-generated
scores
……
…… ……
……
 
m
i
ii
zyL
1
)()(
,
Listwise loss
function
LambdaRank [Burges et al. NIPS’06]
LambdaMART [Wu et al. MSR-TR-2008-109]
Algorithms work differently for different types of programs. For instance, new programs tend to
work better with content-based approaches, while the programs never seen by a user tend to
work well with collaborative-filtering approaches. How to complement them?
6
• Random: returns random recommendations.
• Popular: recommends the most popular programs watched by everyone.
• UserPopular: recommends the programs most watched by the user.
• WRMF: a matrix factorization technique for collaborative filtering where
user and item latent vectors are inferred from implicit feedback.
• Content-based: recommends programs based on similar content of
programs watched by the user (e.g. category, actors, directors).
• Learning to Rank: LambdaMART is a type of Gradient Boosted Regression
Trees algorithm that was used to combine recommendation features and
algorithm outputs.
7
client visualizations
(training data)
t0 t1
client visualizations
(test data)
Learning System model h2
Learning System model h1
Predicting
System
Predicting
System
prediction
prediction
evaluation
10
Time-series 5-fold cross validation
• Accuracy – How well the items meet the users’ information need?
NDCG@k - match between recommendations and what the client watched after
• Diversity – How different are the items with respect to each other?
Intra-list diversity@k (ILD@k)
• Novelty – How novel is the item for all users?
Mean self-information (MSI@k)
• Serendipitious – How different is the item with respect to the user history?
Unexpecteness@k
Only the top-k items in the recommendation lists are considered.
We used a k of 5 and 10, because these are the typical sizes of recommendation lists shown in Cable TV.
9
1. Compute a recommendation list optimized for accuracy
2. Rerank recommendations, usually trading-off accuracy for other metrics (e.g. diversity
and novelty)
GreedyRecommendations(programs,objective_function) {
ranking = Ø
repeat
programsi = argmax( i ) objective_function(ranking ∪ programsk), for all k ∈ [1..#programs]
ranking = ranking ∪ programsi
programs = programs  programsi
until programs = Ø
return ranking
}
objective_function() calculates a score for a ranking list
e.g. 50% NDCG@5 + 25% ILD@5 + 25% MSI@5 10
accuracy accuracy for new diversity novelty serendipitious global
nDCG@
5
nDCG@
10
nDCG@
5
nDCG@
10
ILD@
5
ILD@
10
MSI@
5
MSI@
10
Seren@
5
Seren@
10
objective
@5
objective
@10
Random 0.002 0.002 0.001 0.002 0.919 0.920 0.677 0.679 0.935 0.935 0.400 0.401
Popular 0.293 0.247 0.021 0.018 0.600 0.541 0.058 0.080 0.888 0.887 0.311 0.279
UserPopular 0.694 0.600 0.000 0.000 0.708 0.727 0.200 0.213 0.872 0.872 0.574 0.535
WRMF 0.306 0.277 0.034 0.030 0.523 0.588 0.153 0.162 0.856 0.858 0.322 0.326
Content-
based 0.345 0.316 0.059 0.052 0.503 0.526 0.346 0.354 0.837 0.831 0.385 0.378
L2R 0.726 0.630 0.110 0.093 0.683 0.692 0.199 0.214 0.870 0.868 0.583 0.541
Greedy + L2R 0.556 0.440 0.027 0.010 0.753 0.784 0.606 0.654 0.850 0.857 0.618 0.580
Greedy trade-off accuracy for diversity and novelty
Random provides the best results in these 3 metrics.
L2R provides the best accurracy.
Used as example: objective = 50% nDCG + 25% ILD + 25% MSI
11
accuracy accuracy for new diversity novelty serendipitious global
nDCG@
5
nDCG@
10
nDCG@
5
nDCG@
10
ILD@
5
ILD@
10
MSI@
5
MSI@
10
Seren@
5
Seren@
10
objective
@5
objective
@10
Random 0.002 0.002 0.003 0.003 0.919 0.920 0.677 0.679 0.935 0.935 0.400 0.401
Popular 0.295 0.252 0.007 0.006 0.600 0.550 0.058 0.081 0.888 0.887 0.312 0.283
UserPopular 0.699 0.603 0.000 0.000 0.708 0.727 0.186 0.199 0.872 0.872 0.573 0.533
WRMF 0.318 0.289 0.020 0.017 0.534 0.598 0.129 0.138 0.862 0.864 0.325 0.329
Content-
based 0.435 0.396 0.047 0.038 0.525 0.550 0.238 0.242 0.840 0.836 0.408 0.396
L2R 0.726 0.631 0.115 0.081 0.683 0.692 0.193 0.206 0.870 0.869 0.582 0.540
Greedy + L2R 0.524 0.434 0.030 0.011 0.735 0.793 0.701 0.678 0.843 0.843 0.621 0.585
12
• Typical approaches used in VOD, such as collaborative-filtering and content-based
filtering, are not effective for Live TV and Catch-up TV.
• Our learning to rank approach with contextual information and implicit feedback is
superior in accuracy and accuracy for programs never seen before, while
maintaining high scores of diversity and serendipity.
• still, recommending programs never seen before requires further investigation.
• A solution to optimize recommendations for multiple metrics were proposed, which
enables to easily adapt recommendations to user preferences.
• still, it is necessary to understand what clients value (e.g. accuracy vs. diversity).
13
Thank you.
migcosta@gmail.com
Novelty
Serendipity
Accuracy
Diversity
IDCG is the value for the perfect ranking
reli is 1 if the user watched the program at rank i
or 0 otherwise
𝑑 𝑖, 𝑗 = ⅓ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖=𝑗 +
⅓ 𝑠𝑢𝑏𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖=𝑗 +
⅓ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙(𝑖=𝑗)
𝐼𝐿𝐷(𝑢) =
1
)|𝑅|( 𝑅 − 1
𝑖∈𝑅 𝑗∈𝑅
𝑑 𝑖, 𝑗 𝐼𝐿𝐷 =
1
𝑈
𝑢∈𝑈
)𝐼𝐿𝐷(𝑢
Ru = list of recommended items for user u
Wu = watching history for user u
U = list of users
𝑈𝑛𝑒𝑥𝑝 𝑢 =
1
𝑅 𝑊𝑢
𝑖∈𝑅 𝑗∈𝑊𝑢
𝑑 𝑖, 𝑗 𝑈𝑛𝑒𝑥𝑝 =
1
𝑈
𝑢∈𝑈
)𝑈𝑛𝑒𝑥𝑝(𝑢
𝑀𝑆𝐼 𝑢 =
1
𝑅 𝑢
𝑖∈𝑅
| 𝑣 ∈ 𝑈 𝑖 ∈ 𝑊𝑣}|
𝑈
𝑀𝑆𝐼 =
1
𝑈
𝑢∈𝑈
)𝑀𝑆𝐼(𝑢
15

More Related Content

Similar to A flexible recommenndation system for Cable TV

FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 

Similar to A flexible recommenndation system for Cable TV (20)

Labaiik Marketing Research Plan 2021 (1!).pptx
Labaiik Marketing Research Plan 2021 (1!).pptxLabaiik Marketing Research Plan 2021 (1!).pptx
Labaiik Marketing Research Plan 2021 (1!).pptx
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
 
Optimizing Observability Spend: Metrics
Optimizing Observability Spend: MetricsOptimizing Observability Spend: Metrics
Optimizing Observability Spend: Metrics
 
Predictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment BusinessesPredictive Solutions and Analytics for TV & Entertainment Businesses
Predictive Solutions and Analytics for TV & Entertainment Businesses
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Thesis_presentation_arda_tasci
Thesis_presentation_arda_tasciThesis_presentation_arda_tasci
Thesis_presentation_arda_tasci
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
 
Building an recommendation system for IPTV on a fast streaming architecture -...
Building an recommendation system for IPTV on a fast streaming architecture -...Building an recommendation system for IPTV on a fast streaming architecture -...
Building an recommendation system for IPTV on a fast streaming architecture -...
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Recommender system
Recommender system Recommender system
Recommender system
 
User Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience ExpansionUser Behavior Hashing for Audience Expansion
User Behavior Hashing for Audience Expansion
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
Supersede overview presentation
Supersede overview presentationSupersede overview presentation
Supersede overview presentation
 
VINOD_6yrs
VINOD_6yrsVINOD_6yrs
VINOD_6yrs
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
Blackboard Learn Deployment: A Detailed Update of Managed Hosting and SaaS De...
 

More from IntoTheMinds

More from IntoTheMinds (20)

Voilà à quoi ressemblera la reprise
Voilà à quoi ressemblera la repriseVoilà à quoi ressemblera la reprise
Voilà à quoi ressemblera la reprise
 
The advertising campaigns run in Belgium during the Covid-19 crisis
The advertising campaigns run in Belgium during the Covid-19 crisisThe advertising campaigns run in Belgium during the Covid-19 crisis
The advertising campaigns run in Belgium during the Covid-19 crisis
 
Presentation Christian Radler at EBU Conference "data in the newsroom"
Presentation Christian Radler at EBU Conference "data in the newsroom"Presentation Christian Radler at EBU Conference "data in the newsroom"
Presentation Christian Radler at EBU Conference "data in the newsroom"
 
Presentation Sabino Metta at EBU Conference "data in the newsroom"
Presentation Sabino Metta at EBU Conference "data in the newsroom"Presentation Sabino Metta at EBU Conference "data in the newsroom"
Presentation Sabino Metta at EBU Conference "data in the newsroom"
 
Presentation Stéphane Saulnier at EBU Conference "data in the newsroom"
Presentation Stéphane Saulnier at EBU Conference "data in the newsroom"Presentation Stéphane Saulnier at EBU Conference "data in the newsroom"
Presentation Stéphane Saulnier at EBU Conference "data in the newsroom"
 
Presentation Kristofer Sjoholm at EBU Conference "data in the newsroom"
Presentation Kristofer Sjoholm at EBU Conference "data in the newsroom"Presentation Kristofer Sjoholm at EBU Conference "data in the newsroom"
Presentation Kristofer Sjoholm at EBU Conference "data in the newsroom"
 
Purchase drivers for iconic products in the luxury sector
Purchase drivers for iconic products in the luxury sectorPurchase drivers for iconic products in the luxury sector
Purchase drivers for iconic products in the luxury sector
 
A robot called Voitto
A robot called VoittoA robot called Voitto
A robot called Voitto
 
presentation Newsbridge by Philippe Petitpont at Media Fast Forward 2018
presentation Newsbridge by Philippe Petitpont at Media Fast Forward 2018presentation Newsbridge by Philippe Petitpont at Media Fast Forward 2018
presentation Newsbridge by Philippe Petitpont at Media Fast Forward 2018
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
 
Privacy Calculus in the Sharing Economy
Privacy Calculus in the Sharing EconomyPrivacy Calculus in the Sharing Economy
Privacy Calculus in the Sharing Economy
 
Toon borré presentation at Meetup Big Data and Ethics at DigitYser Brussels 1...
Toon borré presentation at Meetup Big Data and Ethics at DigitYser Brussels 1...Toon borré presentation at Meetup Big Data and Ethics at DigitYser Brussels 1...
Toon borré presentation at Meetup Big Data and Ethics at DigitYser Brussels 1...
 
Leenke De Donder presentation at Meetup Big Data and Ethics at DigitYser Brus...
Leenke De Donder presentation at Meetup Big Data and Ethics at DigitYser Brus...Leenke De Donder presentation at Meetup Big Data and Ethics at DigitYser Brus...
Leenke De Donder presentation at Meetup Big Data and Ethics at DigitYser Brus...
 
Jochanen eynikel presentation at Meetup Big Data and Ethics at DigitYser Brus...
Jochanen eynikel presentation at Meetup Big Data and Ethics at DigitYser Brus...Jochanen eynikel presentation at Meetup Big Data and Ethics at DigitYser Brus...
Jochanen eynikel presentation at Meetup Big Data and Ethics at DigitYser Brus...
 
Thomas carette presentation at Meetup Big Data and Ethics at DigitYser Brusse...
Thomas carette presentation at Meetup Big Data and Ethics at DigitYser Brusse...Thomas carette presentation at Meetup Big Data and Ethics at DigitYser Brusse...
Thomas carette presentation at Meetup Big Data and Ethics at DigitYser Brusse...
 
Big Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrandBig Data and ethics meetup : slides presentation michael ekstrand
Big Data and ethics meetup : slides presentation michael ekstrand
 
Presentatie big data (Dag van de verkoper, Cevora)
Presentatie big data (Dag van de verkoper, Cevora) Presentatie big data (Dag van de verkoper, Cevora)
Presentatie big data (Dag van de verkoper, Cevora)
 
Presentatie big data in verkoop (cevora) gent 16 Mei 2017
Presentatie big data in verkoop (cevora) gent 16 Mei 2017Presentatie big data in verkoop (cevora) gent 16 Mei 2017
Presentatie big data in verkoop (cevora) gent 16 Mei 2017
 
Slides pierre nicolas schwab DISummit 2017 (Big Data, Brussels)
Slides pierre nicolas schwab DISummit 2017 (Big Data, Brussels)Slides pierre nicolas schwab DISummit 2017 (Big Data, Brussels)
Slides pierre nicolas schwab DISummit 2017 (Big Data, Brussels)
 
"Building Trust" discussion panel at EBU Big Data conference 2017 (Pierre-Nic...
"Building Trust" discussion panel at EBU Big Data conference 2017 (Pierre-Nic..."Building Trust" discussion panel at EBU Big Data conference 2017 (Pierre-Nic...
"Building Trust" discussion panel at EBU Big Data conference 2017 (Pierre-Nic...
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

A flexible recommenndation system for Cable TV

  • 1. A Flexible Recommendation System for Cable TV Diogo Gonçalves, Miguel Costa, Francisco Couto LaSIGE @ Faculty of Sciences, University of Lisbon RecSysTV 2016, Boston, USA September 15, 2016
  • 2. • Information overflow • too many video contents from which to choose • too much time exploring video contents (thousands of programs broadcast in hundreds of TV channels, plus thousands of movies & series on VOD) • Impact • dissatisfaction • change to other systems with recommendations (e.g. Netflix, Youtube) • less visualization time • less revenue • churn Source: The Netflix Recommender System: Algorithms, Business Value, and Innovation, 2016 2
  • 3. • Characterization of user behaviour in cable TV (previous study). • showed that Live TV and Catch-up TV have more users and views • Improvement of state-of-the-art algorithms using the learning-to-rank (L2R) framework with contextual information and implicit feedback. 1. Extraction of implicit feedback for each pair <user, program> 2. Feature engineering on the contextual information 3. Creation of a large-scale dataset for learning & evaluation 4. Creation of a L2R model (cast recommendations as a supervised learning problem) 5. Evaluation against state-of-the-art algorithms 3 goal
  • 4. • 12 weeks, between October and December of 2015 • 10K clients randomly chosen that watched at least 10 programs per week • 21K programs available to the users during that period • 4.5M of user visualizations (users saw more than 50%) • we assumed users liked these ones • 78.5M of user visualizations inferior to 50% (most are 0%) • In average, each user watched 454 programs and each program was watched by 216 users 4 implicit feedback (ground truth)
  • 5. • Users tend to see the same programs & channels • Rankings of programs and channels & the relative visualization in each one (per user and for all users) • Users tend to see the same categories & subcategories of contents • Rankings of categories and subcategories & the relative visualization in each one (per user and for all users) • Time influences user preferences • Weekend information, broadcast period, time passed since broadcast, last days information • Users tend to see programs with similar textual description of their content • Textual similarity functions (e.g. Jaccard, TFxIDF) between metadata (e.g. title, description) • Users tend to see programs with similar characteristics • Number of episodes, content age & duration, category & subcategory • Different types of contents require different strategies • Information about who saw the content and whether is new for the user or for all • Similar users watch the same programs • Colaborative-filtering algorithms, such as WRMF (Collaborative Filtering for Implicit Feedback Datasets, ICDM 2008) (60 features created based on the insights of previous characterization) 5
  • 6. Users …… …… )1( q )( 1 i d )2( q )(i q )(m q )( 2 i d )(i jd )( )( i n id …… …… Programs )( 1 i y )( 2 i y )(i jy )( )( i n iyPreference scores )( 1 i x )( 2 i x )(i jx )( )( i n ixFeature vectors )( 1 i z )( 2 i z )(i jz )( )( i n izModel-generated scores …… …… …… ……   m i ii zyL 1 )()( , Listwise loss function LambdaRank [Burges et al. NIPS’06] LambdaMART [Wu et al. MSR-TR-2008-109] Algorithms work differently for different types of programs. For instance, new programs tend to work better with content-based approaches, while the programs never seen by a user tend to work well with collaborative-filtering approaches. How to complement them? 6
  • 7. • Random: returns random recommendations. • Popular: recommends the most popular programs watched by everyone. • UserPopular: recommends the programs most watched by the user. • WRMF: a matrix factorization technique for collaborative filtering where user and item latent vectors are inferred from implicit feedback. • Content-based: recommends programs based on similar content of programs watched by the user (e.g. category, actors, directors). • Learning to Rank: LambdaMART is a type of Gradient Boosted Regression Trees algorithm that was used to combine recommendation features and algorithm outputs. 7
  • 8. client visualizations (training data) t0 t1 client visualizations (test data) Learning System model h2 Learning System model h1 Predicting System Predicting System prediction prediction evaluation 10 Time-series 5-fold cross validation
  • 9. • Accuracy – How well the items meet the users’ information need? NDCG@k - match between recommendations and what the client watched after • Diversity – How different are the items with respect to each other? Intra-list diversity@k (ILD@k) • Novelty – How novel is the item for all users? Mean self-information (MSI@k) • Serendipitious – How different is the item with respect to the user history? Unexpecteness@k Only the top-k items in the recommendation lists are considered. We used a k of 5 and 10, because these are the typical sizes of recommendation lists shown in Cable TV. 9
  • 10. 1. Compute a recommendation list optimized for accuracy 2. Rerank recommendations, usually trading-off accuracy for other metrics (e.g. diversity and novelty) GreedyRecommendations(programs,objective_function) { ranking = Ø repeat programsi = argmax( i ) objective_function(ranking ∪ programsk), for all k ∈ [1..#programs] ranking = ranking ∪ programsi programs = programs programsi until programs = Ø return ranking } objective_function() calculates a score for a ranking list e.g. 50% NDCG@5 + 25% ILD@5 + 25% MSI@5 10
  • 11. accuracy accuracy for new diversity novelty serendipitious global nDCG@ 5 nDCG@ 10 nDCG@ 5 nDCG@ 10 ILD@ 5 ILD@ 10 MSI@ 5 MSI@ 10 Seren@ 5 Seren@ 10 objective @5 objective @10 Random 0.002 0.002 0.001 0.002 0.919 0.920 0.677 0.679 0.935 0.935 0.400 0.401 Popular 0.293 0.247 0.021 0.018 0.600 0.541 0.058 0.080 0.888 0.887 0.311 0.279 UserPopular 0.694 0.600 0.000 0.000 0.708 0.727 0.200 0.213 0.872 0.872 0.574 0.535 WRMF 0.306 0.277 0.034 0.030 0.523 0.588 0.153 0.162 0.856 0.858 0.322 0.326 Content- based 0.345 0.316 0.059 0.052 0.503 0.526 0.346 0.354 0.837 0.831 0.385 0.378 L2R 0.726 0.630 0.110 0.093 0.683 0.692 0.199 0.214 0.870 0.868 0.583 0.541 Greedy + L2R 0.556 0.440 0.027 0.010 0.753 0.784 0.606 0.654 0.850 0.857 0.618 0.580 Greedy trade-off accuracy for diversity and novelty Random provides the best results in these 3 metrics. L2R provides the best accurracy. Used as example: objective = 50% nDCG + 25% ILD + 25% MSI 11
  • 12. accuracy accuracy for new diversity novelty serendipitious global nDCG@ 5 nDCG@ 10 nDCG@ 5 nDCG@ 10 ILD@ 5 ILD@ 10 MSI@ 5 MSI@ 10 Seren@ 5 Seren@ 10 objective @5 objective @10 Random 0.002 0.002 0.003 0.003 0.919 0.920 0.677 0.679 0.935 0.935 0.400 0.401 Popular 0.295 0.252 0.007 0.006 0.600 0.550 0.058 0.081 0.888 0.887 0.312 0.283 UserPopular 0.699 0.603 0.000 0.000 0.708 0.727 0.186 0.199 0.872 0.872 0.573 0.533 WRMF 0.318 0.289 0.020 0.017 0.534 0.598 0.129 0.138 0.862 0.864 0.325 0.329 Content- based 0.435 0.396 0.047 0.038 0.525 0.550 0.238 0.242 0.840 0.836 0.408 0.396 L2R 0.726 0.631 0.115 0.081 0.683 0.692 0.193 0.206 0.870 0.869 0.582 0.540 Greedy + L2R 0.524 0.434 0.030 0.011 0.735 0.793 0.701 0.678 0.843 0.843 0.621 0.585 12
  • 13. • Typical approaches used in VOD, such as collaborative-filtering and content-based filtering, are not effective for Live TV and Catch-up TV. • Our learning to rank approach with contextual information and implicit feedback is superior in accuracy and accuracy for programs never seen before, while maintaining high scores of diversity and serendipity. • still, recommending programs never seen before requires further investigation. • A solution to optimize recommendations for multiple metrics were proposed, which enables to easily adapt recommendations to user preferences. • still, it is necessary to understand what clients value (e.g. accuracy vs. diversity). 13
  • 15. Novelty Serendipity Accuracy Diversity IDCG is the value for the perfect ranking reli is 1 if the user watched the program at rank i or 0 otherwise 𝑑 𝑖, 𝑗 = ⅓ 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖=𝑗 + ⅓ 𝑠𝑢𝑏𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑖=𝑗 + ⅓ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙(𝑖=𝑗) 𝐼𝐿𝐷(𝑢) = 1 )|𝑅|( 𝑅 − 1 𝑖∈𝑅 𝑗∈𝑅 𝑑 𝑖, 𝑗 𝐼𝐿𝐷 = 1 𝑈 𝑢∈𝑈 )𝐼𝐿𝐷(𝑢 Ru = list of recommended items for user u Wu = watching history for user u U = list of users 𝑈𝑛𝑒𝑥𝑝 𝑢 = 1 𝑅 𝑊𝑢 𝑖∈𝑅 𝑗∈𝑊𝑢 𝑑 𝑖, 𝑗 𝑈𝑛𝑒𝑥𝑝 = 1 𝑈 𝑢∈𝑈 )𝑈𝑛𝑒𝑥𝑝(𝑢 𝑀𝑆𝐼 𝑢 = 1 𝑅 𝑢 𝑖∈𝑅 | 𝑣 ∈ 𝑈 𝑖 ∈ 𝑊𝑣}| 𝑈 𝑀𝑆𝐼 = 1 𝑈 𝑢∈𝑈 )𝑀𝑆𝐼(𝑢 15