SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
SVD Applied to
Collaborative Filtering
      ~ URUG 7-12-07 ~
Recommendation System
Recommendation System
Answers the question:
What do I want next?!?
Recommendation System
Answers the question:
What do I want next?!?

 Very consumer driven.

 Must provide good results or a user may not
 trust the system in the future.
Collaborative Filtering
Base user recommendations off of:

  User’s past history.

  History of like-minded users.

View data as product X user matrix.

Find a “neighborhood” of similar users
for that user.

Return the top-N recommendations.
Early Approaches

Goldberg, et. al. (1992), Using
collaborative filtering to weave an
information tapestry
Konstan, J., el. at (1997), Applying
Collaborative Filtering to Usenet news.

Use Pearson Correlation or cosine similarity
as a measure of similarity to form
neighborhoods.
Early CF Challenges
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.

Synonymy
Dimensionality Reduction
Dimensionality Reduction
 Latent Semantic Indexing (LSI)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Framing LSI for CF
Products X Users matrix instead of Terms X
Documents.

        Netflix Dataset
480,189 users, 17,770 movies, only ~100 milion ratings.

17,770 X 480,189 matrix that is 99% sparse!

  About 8.5 billion potential ratings.
SVD- The math behind LSI
   Singular Value Decomposition

      For any M x N matrix A of rank r, it can
      decomposed as:

                                         T
      A = UΣV
 U is a M x M orthogonal matrix.
 V is a N X N orthogonal matrix.
 Σ is a M x N diagonal matrix whose first r diagonal
 entries are the nonzero singular values of A.
σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
Related to eigenvalue
  decomposition (PCA)
U is the orthornormal eigenspace of
AA^T. Spans the “column space”, known
as left singular vectors.
V is the orthornormal eigenspace of
A^TA. Spans “row space”. Right vectors.
Singular values are the square roots of
the eigenvalues.
Reducing Dimensionality


                                  T
                      Ak = Uk ΣkVk

 A_k is the closest approximation to A.

 A_k minimizes the Frobenius norm over all
 rank-k matrices: ||A − Ak ||F
Making Recommendations
 Cosine Similarity- common way to find neighborhood.
                   i· j
 cos(i, j) =
             ||i||2 ∗ || j||2
Somehow base recommendations off of that
neighborhood and its users.

Can also make predictions of products with a simple
dot product if the singular values are combined with
the singular vectors.
                        1/2      1/2 T
     CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
Challenges with SVD
Scalability - Once again, compute
time grows with the number of users
and products. O(m^3)
  Offline stage.
  Online stage.
Even doing the SVD computation offline
is not possible for large datasets.
Other methods are needed.
Incremental SVD
          T
 uk = u       Vk Σk
                  −1
Incremental SVD Results
GHA for SVD
  Gorrell (2006),GHA for Incremental SVD in
  NLP

      Based off of Sanger’s (1989) GHA for eigen
      decomposition.
  a
∆ci      b
      = ci · b(x −    ∑           a a
                            (a · c j )c j )
                      j<i
  b
∆ci      a
      = ci · a(b −   ∑           b b
                           (b · c j )c j )
                     j<i
GHA extended by Funk

 void train(int user, int movie, real rating)
 {
 
real err = lrate * (rating - predictRating(movie, user));

 
userValue[user] += err * movieValue[movie];
 
movieValue[movie] += err * userValue[user];
 }
Netflix Results
Best RMSEs

  0.9283

  0.9212

Blended to get 0.9189, 3.42% better than
Netflix.
Summary
SVD provides an elegant and automatic
recommendation system that has the
potential to scale.

There are many different algorithms to
calculate or at least approximate SVD which
can be used in offline stages for websites
that need to have CF.

Every dataset is different and requires
experimentation with to get the best results.

Mais conteúdo relacionado

Mais procurados

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Lei Guo
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
sscdotopen
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
Liang Xiang
 

Mais procurados (20)

Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Session-Based Recommender Systems
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender Systems
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)Deep learning based recommender systems (lab seminar paper review)
Deep learning based recommender systems (lab seminar paper review)
 

Semelhante a SVD and the Netflix Dataset

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
zukun
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
acijjournal
 

Semelhante a SVD and the Netflix Dataset (20)

Group Project
Group ProjectGroup Project
Group Project
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Gene's law
Gene's lawGene's law
Gene's law
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...
 
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTPerformance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 

Mais de Ben Mabey

Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already Speak
Ben Mabey
 

Mais de Ben Mabey (8)

PCA for the uninitiated
PCA for the uninitiatedPCA for the uninitiated
PCA for the uninitiated
 
Clojure, Plain and Simple
Clojure, Plain and SimpleClojure, Plain and Simple
Clojure, Plain and Simple
 
Github flow
Github flowGithub flow
Github flow
 
Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already Speak
 
Writing Software not Code with Cucumber
Writing Software not Code with CucumberWriting Software not Code with Cucumber
Writing Software not Code with Cucumber
 
Outside-In Development With Cucumber
Outside-In Development With CucumberOutside-In Development With Cucumber
Outside-In Development With Cucumber
 
Disconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordDisconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecord
 
The WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecThe WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpec
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

SVD and the Netflix Dataset

  • 1. SVD Applied to Collaborative Filtering ~ URUG 7-12-07 ~
  • 3. Recommendation System Answers the question: What do I want next?!?
  • 4. Recommendation System Answers the question: What do I want next?!? Very consumer driven. Must provide good results or a user may not trust the system in the future.
  • 5. Collaborative Filtering Base user recommendations off of: User’s past history. History of like-minded users. View data as product X user matrix. Find a “neighborhood” of similar users for that user. Return the top-N recommendations.
  • 6. Early Approaches Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news. Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.
  • 8. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs.
  • 9. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
  • 10. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users. Synonymy
  • 12. Dimensionality Reduction Latent Semantic Indexing (LSI)
  • 13. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.)
  • 14. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
  • 15. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships.
  • 16. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 17. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 18. Framing LSI for CF Products X Users matrix instead of Terms X Documents. Netflix Dataset 480,189 users, 17,770 movies, only ~100 milion ratings. 17,770 X 480,189 matrix that is 99% sparse! About 8.5 billion potential ratings.
  • 19. SVD- The math behind LSI Singular Value Decomposition For any M x N matrix A of rank r, it can decomposed as: T A = UΣV U is a M x M orthogonal matrix. V is a N X N orthogonal matrix. Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A. σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
  • 20. Related to eigenvalue decomposition (PCA) U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors. V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors. Singular values are the square roots of the eigenvalues.
  • 21. Reducing Dimensionality T Ak = Uk ΣkVk A_k is the closest approximation to A. A_k minimizes the Frobenius norm over all rank-k matrices: ||A − Ak ||F
  • 22. Making Recommendations Cosine Similarity- common way to find neighborhood. i· j cos(i, j) = ||i||2 ∗ || j||2 Somehow base recommendations off of that neighborhood and its users. Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors. 1/2 1/2 T CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
  • 23. Challenges with SVD Scalability - Once again, compute time grows with the number of users and products. O(m^3) Offline stage. Online stage. Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.
  • 24. Incremental SVD T uk = u Vk Σk −1
  • 26. GHA for SVD Gorrell (2006),GHA for Incremental SVD in NLP Based off of Sanger’s (1989) GHA for eigen decomposition. a ∆ci b = ci · b(x − ∑ a a (a · c j )c j ) j<i b ∆ci a = ci · a(b − ∑ b b (b · c j )c j ) j<i
  • 27. GHA extended by Funk void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user)); userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }
  • 28. Netflix Results Best RMSEs 0.9283 0.9212 Blended to get 0.9189, 3.42% better than Netflix.
  • 29. Summary SVD provides an elegant and automatic recommendation system that has the potential to scale. There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF. Every dataset is different and requires experimentation with to get the best results.