3. Introduction Problem Space Tools of the Trade
Challenges likely unique to
Uber .. interesting
opportunities
Challenges &
Opportunities
Who am I and what are we
talking about today?
Why does Uber need ML
and what are some of the
problems we tackle?
What does Uber’s tech
stack look like?
Agenda
Hop on the Uber ML Ride … destination please?
5. • Engineering Leader @ Uber
• Marketplace Data
• Realtime Data Processing
• Analytics
• Forecasting
• Previous -> MicroServices/Cloud Platform at
Netflix
• Twitter @stonse
5
Who am I?
6. Driver Partner Riders Merchants
Uber’s logistic platform
Marketplace
Our partner in the ride
sharing business
Folks like you and me who
request a ride on any of
Uber’s transportation
products. e.g. UberX,
uberPool
Restaurants or shops that
have signed on to the
Uber platform.
Introduction
Uber
8. • Mapping (Routes, ETAs, …)
• Fraud and Security
• uberEATS Recommendations
• Marketplace Optimizations
• Forecasting
• Driver Positioning
• Health, Trends, Issues, ...
• And more …
ML Problems
Why do we need Machine Learning?
ETA, Route Optimization,
Pickup Points, Pool rider
matches
9. Marketplace
Build the platform, products, and algorithms
responsible for the real time execution and online
optimization of Uber's marketplace.
We are building the brain of Uber, solving NP-hard
algorithms and economic optimization problems at
scale.
Uber | Marketplace
Mission
16. Scale ..
For a fine grained OLAP system
1 day of data:
~400 (cities) x 10,000 (avg number of hexagons
per city) x 7 (Vehicle types) x 1440 (minutes per
day) x 13 (Trip States)
524 billion possible combinations
22. Spatial granularity & Multiresolution Forecasting
Some small challenges
The more you aggregate
or zoom out, trends
emerge
Sparsity at hexagon level:
many hexagons have little
signal
23. 1. Forecast at the hex-cluster level
2. Using past activity for a similar time window,
apportion out total activity from the hex-
cluster to its component hexagons
Multiresolution Forecasting
Forecasting at different spatial granularity
26. “ETR too
much. I bail
out ..”
Solution: Time Meter Banner
“Only about 20
minutes. I would
wait!”
20 minutes wait to get a
$40 trip, oh yeah!
27. Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
28. Data Preparation
A Typical Data Scientist Workflow
Analyze/Prepare
Data exploration,
cleansing,
transformations etc.
Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
29. Data Science Flow
A Typical Data Scientist Workflow
Feature Selection
Model Fitting
Evaluation
StorageEvaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
31. Data Science Flow
A Typical Data Scientist Workflow
Analyze/Prepare Feature Selection
Model Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signals Use Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
32. Overview
Streamline the forecasting process
from conception to production
• Streams w/ flexible geo-
temporal resolution
• Valuable external data feeds
• Modular, reusable
components at each stage
• Same code for offline
model fitting and
production to enable fast
model iteration
Operators & Computation DAGs
Feature Generation
Online ModelsOffline Model Fitting
Predictions, Metrics & Visualizations
External DataStreams
Airport feed
Weather feed
Concerts feed
33. Realtime Models
- Something happened at a time and a
place. Now we will
Evaluate the DAG
- DAG evaluated for a single instant in time
real-time spatiotemporal forecasting at a variable resolution of time and space
35. • Curated set of algorithms
• Model Versioning
• Model Performance & Visualizations
• Automated Deployment Workflow
• …
Machine Learning as a Service
ML workflow at Uber
36. Open Source Technologies
Sub-title
Samza
Micro Batch based processing
Good integration with HDFS & S3
Exactly once semantics
Spark Streaming
Well integrated with Kafka
Built in State Management
Built in Checkpointing
Distributed Indexes & Queries
Versatile aggregations
Jupyter/IPython
Great community support
Data Scientists familiar with Python
38. • What’s the best model for integrating vast amounts of disparate kinds
of information over space and time?
• What’s the best way of building spatiotemporal models in a fashion that
is effective, elegant, and debuggable?
• About a 100 or so more … :-)
ML Problems
Challenges
39. Links
Thank you!
• Realtime Streaming at Uber
https://www.infoq.com/presentations/real-
time-streaming-uber
• Spark at Uber
(http://www.slideshare.net/databricks/spark-
meetup-at-uber)
• Career at Uber
(https://www.uber.com/careers/)
•https://join.uber.com/marketplace
40. Happy to discuss design/architecture
Q & A
No product/business questions please :-)
@stonse