O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv) @PAPIs Connect — São Paulo 2017

757 visualizações

Publicada em

News recommendations are particularly challenging given the high number of new contents produced every day and the fast deterioration of its value for the users, demanding models and infrastructure able to deal with those nuances and serve a newly trained model about 100 times per day. Attending this presentation you're going to follow a detailed overview of how R&D team of Hearst's TV division is putting together Google BigQuery, Kubernetes cluster and Tensorflow to build a hybrid recommendation system combining model-based matrix factorization, content recency, and content semantics through NLP.

Publicada em: Tecnologia

A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv) @PAPIs Connect — São Paulo 2017

  1. 1. A Tensorflow Recommending System for News Fabricio Vargas Matos
  2. 2. Manhattan, NYTV Stations Local and National News
  3. 3. Article’s page: recommendations for continuous scroll section Recommended articles
  4. 4. Agenda 1.Recency and cold-start problem 2.Data acquisition 3.Matrix factorization 4.Tensorflow implementation 5.Hybrid Model: NLP and feature engineering 6.Hybrid Model: Hybrid matrix factorization 7.Conclusions
  5. 5. Cold-start problem Existent Items New Items Existent Users New Users
  6. 6. Cold-start solution Existent Items New Items Existent Users New Users Not personalized! Curated by Editors + Highly viewed
  7. 7. Cold-start solution Existent Items New Items Existent Users New Users Not personalized! Curated by Editors + Highly viewed Hybrid Matrix Factorization
  8. 8. Data Acquisition Page views with user’s time on page Google Analytics Google BigQuery CMS Content corpus: title, body, timestamp, meta-data (sections, tags, etc.) Contents TFRecord/CSV files
  9. 9. "Users x Items" Sparsity Dataset Sparsity MovieLens (movies) 98.61% Netflix (movies) 98.82% TV Stations (news) 99.94% Yahoo! KDD (music) 99.96%
  10. 10. Matrix Factorization
  11. 11. VU Latent Factors Model R Items Users ≈ Latent factors Latent factors Items xuserbias item bias i j i j R[i,j] ≈ U[i] x V[j]
  12. 12. TF code: factorization op (…)
  13. 13. TF code: train op
  14. 14. Initial Results • Training time ≈ 15min (Kubernetes cluster) • TimeOnPage Prediction Error (RMSE) ≈ 125 sec • Qualitative recommendation tests with chosen ‘personas’ revealed poor personalization
  15. 15. Hybrid Matrix Factorization Model
  16. 16. Natural Language Processing Concatenate content data (title, body, sections, tags, …) Remove stop words, symbols and HTML tags Train word2vec Neural Network Combine all word-vectors of each article into one (doc2vec) CMS articles doc2vec contents
  17. 17. Contents Data Visualization
  18. 18. Entertainment National News Health Sports Local News
  19. 19. Features Engineering NLP (doc2vec) items clustering (k-means) embed items: similarity to each cluster centroid embed users: viewed contents combined CMS articles k-dimension items/users embeddings Google Cloud Storage
  20. 20. Items Parallel coordinates: 40 features/clusters
  21. 21. Feature #1: Similarity to cluster #1
  22. 22. Feature #39
  23. 23. Who are they? Magenta contents (health) with high values for feature #1 (economy)?
  24. 24. Content/User Embeddings + Matrix Factorization
  25. 25. VU Matrix Factorization R Items Users ≈ Latent factors Latent factors Items xuserbias item bias i j i j R[i,j] ≈ U[i] x V[j]
  26. 26. Hybrid Matrix Factorization • R ≈ U* x V* where: • U* = UUsersxKClusters x AKClustersxLatent_factors • V* = BLatent_factorsxKClusters x VKClustersxItems *Only A and B are variables to be trained. U and V are constants.
  27. 27. TF code: factorization Now:
  28. 28. Results • Training time ≈ 20min (Kubernetes cluster) • TimeOnPage Prediction Error (RMSE) ≈ 100 sec (20% better) • Qualitative recommendation tests with chosen ‘personas’ revealed very good personalization • R&D Project - Not yet publicly available
  29. 29. Let’s talk online fabriciovargasmatos@ Fabricio Vargas Matos

×