O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Building a Recommender System on AWS - AWS Summit Sydney 2018

1.584 visualizações

Publicada em

Building a Recommender System on AWS

Delivering personalised, relevant content can help to grow your audience, improve customer experience and drive higher sales. In this session we will explore different techniques for implementing Recommender Systems, from simple metrics based suggestions to more complex solutions using AI and Machine Learning. We will dive into some public data sets and demonstrate how you can build a Recommender System using AWS services.

Alastair Cousins, Solutions Architect, Amazon Web Services

  • Entre para ver os comentários

Building a Recommender System on AWS - AWS Summit Sydney 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alastair Cousins Senior Solutions Architect, Amazon Web Services Building A Recommender System On AWS
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s Build
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker 1 I Notebook Instances 2 I Algorithms 3 I ML Training Service 4 I ML Hosting Service
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Framing Data Exploration & Preparation Training Optimisation Deployment
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content-Based Filtering UserId Rock Jazz HipHop Classical 7653 5 2 3 1 Generate Recommendations based on known user preferences • Easy to understand and implement • Applicable to domains where user preferences can be captured in full • Cannot predict new user preferences
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collaborative Filtering • Based on Item-to-Item relationships • Derived from Explicit and Implicit features • Recommendations are based on other user’s experience
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Should Our Solution Look Like? UserId Liked MovieId 123 12 19 87 171 456 15 19 87 231 Movies 19, 87 are liked by multiple users
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Should Our Solution Look Like? UserId Liked MovieId 123 12 19 87 171 456 15 19 87 231 Likely Recommendation
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Framing Data Exploration & Preparation Training Optimisation Deployment
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our Data Set: Movielens • Public Data Set produced by GroupLens Research • https://grouplens.org/datasets/movielens/
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Item Information
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. User Information
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Visualising The Data Total Feature Count = Users + Movies = 2625 Features
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Visualising The Data Minimum rating count per user: 20
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Preparation: Binary Classification Not Liked Liked
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Matrix Factorisation With Factorisation Machines Rating Matrix ≈ User Matrix Item Matrix 𝑘 𝑘×
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Preparation: One-Hot Encoding UserId MovieId Rating 2 4 5 Users Movies 0 1 0 0 0 0 0 0 0 1 0 0 0 0 Rating 1
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Preparation: One-Hot Encoding UserId MovieId Rating 2 4 5 2 7 2 Users Movies 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 Rating 1 0
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Preparation: One-Hot Encoding UserId MovieId Rating 2 4 5 2 7 2 4 5 4 Users Movies 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 Rating 1 0 1
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sparse Data • One-Hot Encoding produces a a 2625x90,570 matrix for our training set • This data set is 99.92% zeros. • Use a memory efficient data structure: scipy.lil_matrix Users Movies 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 Rating 1 0 1
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SageMaker Notebooks SageMaker Training Prepare Training Data Amazon S3 Raw Data Prepared Data 1. Import Data 2. Identify and Enrich Features 3. Format data for training 4. Write out data for distributed training Data Preparation
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Framing Data Exploration & Preparation Training Optimisation Deployment
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Log_loss F1 Score Seconds SageMaker 0.494 0.277 820 Other (10 Iter) 0.516 0.190 650 Other (20 Iter) 0.507 0.254 1300 Other (50 Iter) 0.481 0.313 3250 Click Prediction 1 TB advertising dataset, m4.4xlarge machines, perfect scaling. $- $50.00 $100.00 $150.00 $200.00 1. 2.75 4.5 6.25 8. CostinDollars Billable Time in Hours 10 machines 20 machines 30 machines 4050 Factorisation Machines
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training – Single Instance
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training – Multiple Instances
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating The Model F1.000 Score 73.8%
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SageMaker Notebooks Training Algorithm SageMaker Training Prepare Training Data Amazon S3 Amazon S3 Train & Optimise Raw Data Prepared Data Trained Model Docker Container Model Training
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Framing Data Exploration & Preparation Training Optimisation Deployment
  33. 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimisation Approaches Add higher order features Hyperparameter Optimisation Hybrid Models
  34. 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Higher Order Features • Adding additional features can improve accuracy • Additional Features help with Cold Start suggestions • Select features by experimentation Users Movies 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 Genres 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hyperparameters
  36. 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hyperparameter Optimisation Apply Machine Learning to optimise model training hyperparameters F1.000 Score 77.2% (+3.5%) Optimised Hyperparameters
  37. 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hybrid Models 1. Cluster individual users into groups 2. Sort prediction data sets based on genres 3. Generate predictions using the clustered user and filtered prediction data set that aligns best to the application context Horror Fans Age 8-10 New Signups Comedies New Releases Animation
  38. 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem Framing Data Exploration & Preparation Training Optimisation Deployment
  39. 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SageMaker Notebooks Training Algorithm SageMaker Training SageMaker Hosting Prepare Training Data Amazon S3 Amazon S3 Train & Optimise Deploy Raw Data Prepared Data Algorithm Container Trained Model Trained Model HPO Deploying An Endpoint
  40. 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Calling SageMaker Endpoints • Understand the inference request format for your algorithm • Factorisation Machines support JSON & protobuf • Sample JSON Payload:
  41. 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Integrating Endpoints With Applications API Gateway SageMaker Endpoint Lambda Function Client
  42. 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Endpoint Invocation
  43. 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enriching The Endpoint Response Endpoint Response Enriched with MovieId
  44. 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SageMaker Notebooks Training Algorithm SageMaker Training SageMaker Hosting AWS Lambda API Gateway Prepare Training Data Inference requests Amazon S3 Amazon S3 Train & Optimise Deploy Raw Data Prepared Data Algorithm Container Trained Model Trained Model HPO User Interactions Solution Architecture
  45. 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where To From Here? - Sample Code: https://medium.com/@julsimon/building-a-movie- recommender-with-factorization-machines-on-amazon-sagemaker- cedbfc8c93d8 - SageMaker HPO Preview: https://pages.awscloud.com/amazon- sagemaker-hpo-preview.html
  46. 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank You

×