Building A Product Assortment Recommendation Engine

Building a Product Assortment
Recommendation Engine for
Brick-and-Mortar Retailers
Justin Morse, Staff Data Scientist, AB InBev
Ethan DuBois, Senior Software Engineer, AB InBev

Agenda
§ Introductions and overview
§ The problem: product
assortment selection
§ The algorithmic solution
§ Deploying the solution
§ Lessons learned
Justin Morse,
Staff Data
Scientist
Ethan DuBois,
Senior Software
Engineer

2k+
Products
150k+
Retailers
30k+
Employees
ABInBev in the North America

Pivoting towards a tech-oriented approach
LOLA team
launched
(5 employees)
Incorporate
Databricks into
workflows
Begin
R&D partnership
with Bud Lab & MIT
National
launch of first
microservice
Begin
development of
sku-level
recommendation engine
BeerTech
Organization
launched
(73 employees)
2018 2019 2020 2021
Launch
recommendation
engine pilot

Which products should a retailer carry?
An average retailer has >10100 ways to select their product
assortment.

How can we develop a quantitative approach to
assortment planning that accounts for customer
preferences, business priorities, and computational
complexity?

Assortment Recommendation Pipeline
Product Demand
Prediction
Make quantitative estimates
of product demand for each
partnering retailer
Select the best product
assortment given business
requirements and estimated
product demand
Assortment
Optimization
Data Model
Transform datasets into a
format required for our
pipeline
Causal Analysis
Measure the effects of our
modeling interventions

Predicting demand for products in partnering retailers
• Custom built library for
family of discrete
choice models using
PyTorch
• Executed on Databricks
clusters with Azure
functions
• Next steps: scale
training with Petastorm
and Horovod

Optimizing retailer performance
• Use traditional
numerical techniques
to optimize revenue
objective function
• Include filters related to
allowable business
outcomes:
- Size restrictions
- Inventory restrictions
- License restrictions

• Recommendation
engine launched in
partnering retailers
in the Ontario region
• Currently working
with software
engineering team to
scale solution for
North American and
Global launch
Demonstrating value through small-scale pilots

Pilot implementation: chained notebooks in Databricks

Scaling and deploying the solution
• Production quality code standards
• Best-practice Code Distribution
• Repository-based, version-controlled, automated CI/CD
• Flexible and lightweight configuration approach
• Decoupled communication between components
• Infrastructure-as-code
• Ability to scale infra up and down as necessary to meet demand
• API for integration with other applications
After a number of successful pilots, we needed to build a more robust solution that at minimum included:

Scaling and deploying the solution
▪ Production quality code
standards
▪ Best-practice code
distribution
▪ Repository-based, version-
controlled, automated CI/CD
• Flexible/lightweight
• Decoupled from code
• Infrastructure-as-code
• Configuration
• Code
• Decoupled communication
between components
• Ability to automatically and
programmatically scale infra
up or down to meet demand
• API for integration with other
applications
• Orchestration

Scaling and deploying the solution: Technologies
• Configuration
• Code • Orchestration
Azure App
Configuration
Azure
Key Vault
Azure App
Insights
Azure
Event Grid
Azure Function Apps

Code: Refactoring ML Processes
Moving from chained Notebooks to end-to-end Pipelines in Python
• Chained Notebooks
• Didn’t provide the ease of maintenance and visibility that we wanted
• Easy to get lost, added complexity
• Difficult to standardize, scan, control quality across workstreams
• Process-controlled Python Pipelines
• Object-oriented approach
• Make use of shared tools and utilities
• Ability to package and distribute more easily
• CI/CD integration with Github workflows (Code scanning, unit/integration tests, etc)

Code: Refactoring ML Processes

Code: Packaging and Deployment
• Custom Python wheels
• Object-oriented, following best practices approach
• Built and deployed in GitHub Workflows as part of CI/CD
• Distribution: JFrog Artifactory Repository
• Organizational PyPI repo
• Available for installation on all clusters or machines
• Authentication set up with cluster init scripts stored in DBFS
• Roadmap: Move to GitHub Packages once PyPI is supported :’(
Packaging and deploying code to an easily accessible repository for installation on production resources

Configuration
• Azure App Configuration
• Decoupled, customizable approach
• Service-level configs
• Algorithmic constants and other ML settings
• Validation and consistency checks
• Execution-level configs
• Cluster configuration
• Storage locations, file names
• Logging settings
• Azure Key Vault
• Secret storage
• Keys and connection strings for Data lake, Event Grid,
Application Insights, Azure App Config
• Backs Databricks Secret Scope
• Allows for easy access within init scripts and Spark
environment variable configuration
Creating a generic, highly-customizable configuration solution, decoupled from code

Configuration: Logging and Storage

Configuration: Code Integration

Configuration: Azure App Config

Orchestration
§ Azure Functions
▪ REST/HTTP
▪ Event Grid
§ Internal Utilities
▪ Wrappers for Databricks Runs API
▪ Internal config management
▪ Read/Write/Storage management
▪ Logging management
▪ Wrapper modules for Azure SDKs
§ Azure Databricks
▪ Interactive or Job Clusters
▪ Programmatic configuration
▪ Dependencies
▪ Environment variables
▪ Init scripts for pip-conf
§ Azure Application Insights
▪ Custom logging written from Python
processes
§ Azure Event Grid
▪ Custom events published for status updates
• Compute/Monitoring
Kickoff

Orchestration: Azure Functions

Orchestration: Databricks Runs

Orchestration: App Insights Logging

Conclusion
• MVP Released, in production
• Collecting initial user feedback in preparation for future releases
• Lessons Learned
• Development Process: db-connect vs notebooks, pros and cons
• Configuration: moving configs out of code wherever possible
• Pandas vs PySpark: understanding the distinction and implications
• Future Roadmap
• Increased parallelization/distribution for both model training and optimization process
• Added intelligence throughout service: Job progress and ETAs, different Demand Estimate universes
• Enhanced DevOps approach to cloud resource deployment and environment management

Emmanuel Doro
Justin Morse
Phillip Theron
Gui Neubern
Zi Wang
Senthil
Murugappan
Ethan DuBois
Ravi Kolla Sarosh Ahmad
Griffin Ansel
Ashish Baiju
Chris Stone
Nelson Kandeya
Emily Shapiro
Jessica Zou
Vivek Farias Nikos Trichakis
Tianyi Peng Patricio Foncea
DS
DS
DS
DS
DS
DS SE
SE
SE SE
SE
DE
DE
P
P
Lucas Diffey
DE

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Building A Product Assortment Recommendation Engine

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Building A Product Assortment Recommendation Engine

Semelhante a Building A Product Assortment Recommendation Engine (20)

Mais de Databricks

Mais de Databricks (20)

Último

Último (20)

Building A Product Assortment Recommendation Engine