SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Now and The Future
Lyft Data Platform
Mark Grover | @mark_grover
Deepak Tiwari | @_deepaktiwari_
Improve people’s lives with the world’s best transportation
● 30.7M riders in 2018
● 1.9M drivers in 2018
● 1B+ cumulative rides
● 300+ markets in US &
Canada
Data is at the core of decisions at Lyft
Automated decisions
- What’s the price for the ride?
- What driver to match?
- What’s the ETA?
Analyzing business performance
- How are key business metrics
trending?
- How do predicted ETAs compare
to actual?
Human business decisions
- Which opportunities to invest in?
- Which path to take (via
experimentation)?
Data platform users
4
Data Modelers Analysts Data
Scientists
General
Managers
Data Platform
Engineers ExperimentersPMs/Execs
Analytics Biz ops Building apps Experimentation
By numbers...
● Millions of BI queries per
week doubling quarterly
● 5X increase in productivity
of ML models in 2018
● 20X scaling of support of
maps to users through
streaming platform in 2018
Product Teams,
Applied ML, Forecasting
ML Platform
Data Platform and Infra
Source: The AI Hierarchy of Needs, Monica Rogati (8/2017)
Data as a platform to accelerate the business and reduce risk...
● Think ahead in the future (e.g. streaming, machine learning,
security and privacy, visualization, discovery, etc.).
● Provide a step change (vs incremental) in the capability.
● Move fast.
● Create a competitive advantage.
● Focus on impact: Develop jointly with application verticals.
● Build enterprise grade platform.
● Have a clearly defined contract with applications (e.g. SLAs).
● Give a serverless application for the product teams.
Guiding principles for the data platform team...
Innovative
Impactful
Reliable
Use case #1
Unmet need for business metric observability
Business metric observability
What’s the health of the business?
Grafana
Operational observability
What’s the health of the service?
● Is the service up?
● Is it throwing errors?
● In near real-time (< 1 min)
Requirements for biz metric observability
See results within
1 - 30 minutes
Be the source of truthNear real-time
Impact on
business metrics
Derive business metrics
from raw data (aka ETL)
Don’t widen the gap for
reconciliation
11
Project F2 architecture
Data Discovery
app - Amundsen
Operational Data
stores (e.g.
Dynamo)
Apache Superset
CDC
Online flow Offline
flow
Magic of CDC - Change Data Capture
Operational Data
stores (e.g. Dynamo)
Analytical Data stores
(e.g. Hive/Presto, BQ)
1. Tail the operational
Data stores
2. Persist the
raw change log
3. Upsert the
change log to
table periodically
(~30 m)
Advantages of CDC
Data Engineer
Productivity
See results within
30 minutes
Near real-time
Source of truth
No need to reconcile
Same data as operational
DBs
No need to recreate ETL
from events
Easier primitives to build
ETL on top of
● Measuring reliability
○ How to distinguish late arriving data from missing data?
○ How do you trace a single missing revision through all moving parts?
● Lots of moving parts
○ Tailer, tied to implementation of operational DB
○ Ingest pipeline
○ Kafka, Kinesis
○ Analytic Database
Challenges of the architecture
CDC + Streaming =
Lots of business
value
Use case #2
Data Science use cases - Driver app
Data Science use cases - Pricing
Requirements for streaming applications
In Streaming, just like in Batch
Quick and simple ways of cleaning data
Prototype in a language of
choice (Python, R, SQL)
Quick and simple ways of cleaning data
20
Services (e.g.
ETA, Pricing)
Models +
Applications (e.g.
ETA, Pricing)
Flyte
Streaming architecture
Investments in Streaming
Dryft
Fully managed data processing
engine, powering real-time
features and events
- Needed for consistent feature
generation
- Batch processing for bulk
creation of features for training
- Stream processing for
real-time creation of features for
scoring
- Uses Flink SQL under the
hood
Apache Beam
Open source unified, portable
and extensible model for both
batch and streaming use-cases
- Enables streaming use cases
for teams using non JVM
languages
- Uses Flink under the hood
● Things we find at scale
○ Intermittent AWS service errors
○ Can’t be naive about pub-sub consumption
● Integration
○ Things work in isolation, but …
○ Flink Kinesis Connector
■ Connector that work at scale are hard
Challenges of the architecture
Sharing your batch
and streaming
compute will pay
huge benefits
The whole
shebang
25
Data Platform architecture
Data Discovery
app - Amundsen
Services (e.g.
ETA, Pricing)
Operational Data
stores (e.g.
Dynamo)
Models +
Applications (e.g.
ETA, Pricing)
Apache Superset
BI/Data Viz
Marketplace
Operations app
...
Other custom
apps
Custom apps
Flyte
Kafka is better but ….
• Has limitations around fan-in
Kafka vs. Kinesis
Kinesis scaling limitations
• We require high throughput & high fan-out
• Default limit of 500 shards
• Resharding is expensive and slow
• Built a fan-out system to work around
limitations
● Apache Flink vs. Apache Spark vs. Apache Beam
● 2 dimensions of comparison
○ APIs (the kinds of applications you can write)
○ Operations (the kind of applications you can support)
● Apache Beam for multi-language support (Python and Go)
● Spark Streaming - operations were hard, no state evolution, cumulative
latencies with multi-stage graphs.
● Know when to put all your eggs in the same basket (Spark), when not to.
Streaming engines
Interactive querying:
● Redshift
○ Historical but dying
● Druid
○ Interactive use-cases
● Presto (on S3)
○ Super handy interactive query engine
○ Lacking real-time ingestion support
● BigQuery
○ Interactive query engine (like Presto)
○ Expensive, but great streaming support!
ETL:
● Hive (on S3)
○ Mostly for ETL and adhoc queries that are too large to run on Presto
● Spark
○ Some ETL, potential for all ETL to be in Spark
Data Storage and processing
Future of Interactive querying
Unified access layer
e.g. DAL, Genie, DALi Views
Future of ETL
- Easily schedule with dependencies, a
SQL query to be an ETL job
- Diagnose job failures with lineage and
dashboards on data skew, etc.
● Airflow
○ Most ETL jobs
○ Python heavy DAGs
○ Really good community to support
● Flyte
○ Focussed on ML workflows
○ Built in Provenance
○ Intermediate caching, discovery of previously computed artifacts
Workflow engines
Conclusion
● We think about data as a platform and a competitive advantage.
● Our data and platform usage is growing really really fast.
● We support Data Science, Ops, Analytics, Experimentation and other
use cases.
● We have seen tremendous benefit from CDC data + Streaming
frameworks to deliver business metric observability.
● We have learned and gained a lot in operational excellence by
sharing our batch and stream compute frameworks.
● We are investing in Data Discovery, Streaming, and Machine
Learning.
Conclusion
Attend Streaming at Lyft session tomorrow!
Attend Meetup at Level39 tonight!
Thank you
go.lyft.com/lyftdataplatformMay 2nd, 2019
Mark Grover | @mark_grover
Deepak Tiwari | @_deepaktiwari_

Mais conteúdo relacionado

Mais procurados

Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 

Mais procurados (20)

Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
 
Kibana overview
Kibana overviewKibana overview
Kibana overview
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 

Semelhante a Lyft data Platform - 2019 slides

Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 

Semelhante a Lyft data Platform - 2019 slides (20)

Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at LyftLyft talks #4 Orchestrating big data and ML pipelines at Lyft
Lyft talks #4 Orchestrating big data and ML pipelines at Lyft
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Shaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M ResumeShaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M Resume
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 

Mais de Karthik Murugesan

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Karthik Murugesan
 

Mais de Karthik Murugesan (20)

Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Yahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slidesYahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slides
 
Free servers to build Big Data Systems on: Bing's Approach
Free servers to build Big Data Systems on: Bing's  Approach Free servers to build Big Data Systems on: Bing's  Approach
Free servers to build Big Data Systems on: Bing's Approach
 
Microsoft cosmos
Microsoft cosmosMicrosoft cosmos
Microsoft cosmos
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER Introduction
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
Unifying Twitter around a single ML platform  - Twitter AI Platform 2019Unifying Twitter around a single ML platform  - Twitter AI Platform 2019
Unifying Twitter around a single ML platform - Twitter AI Platform 2019
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 
The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019
 
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
 
Developing a ML model using TF Estimator
Developing a ML model using TF EstimatorDeveloping a ML model using TF Estimator
Developing a ML model using TF Estimator
 
Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018
 
Netflix factstore for recommendations - 2018
Netflix factstore  for recommendations - 2018Netflix factstore  for recommendations - 2018
Netflix factstore for recommendations - 2018
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017
 
State Of AI 2018
State Of AI 2018State Of AI 2018
State Of AI 2018
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 

Último

Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 

Último (20)

Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 

Lyft data Platform - 2019 slides

  • 1. Now and The Future Lyft Data Platform Mark Grover | @mark_grover Deepak Tiwari | @_deepaktiwari_
  • 2. Improve people’s lives with the world’s best transportation ● 30.7M riders in 2018 ● 1.9M drivers in 2018 ● 1B+ cumulative rides ● 300+ markets in US & Canada
  • 3. Data is at the core of decisions at Lyft Automated decisions - What’s the price for the ride? - What driver to match? - What’s the ETA? Analyzing business performance - How are key business metrics trending? - How do predicted ETAs compare to actual? Human business decisions - Which opportunities to invest in? - Which path to take (via experimentation)?
  • 4. Data platform users 4 Data Modelers Analysts Data Scientists General Managers Data Platform Engineers ExperimentersPMs/Execs Analytics Biz ops Building apps Experimentation
  • 5. By numbers... ● Millions of BI queries per week doubling quarterly ● 5X increase in productivity of ML models in 2018 ● 20X scaling of support of maps to users through streaming platform in 2018
  • 6. Product Teams, Applied ML, Forecasting ML Platform Data Platform and Infra Source: The AI Hierarchy of Needs, Monica Rogati (8/2017) Data as a platform to accelerate the business and reduce risk...
  • 7. ● Think ahead in the future (e.g. streaming, machine learning, security and privacy, visualization, discovery, etc.). ● Provide a step change (vs incremental) in the capability. ● Move fast. ● Create a competitive advantage. ● Focus on impact: Develop jointly with application verticals. ● Build enterprise grade platform. ● Have a clearly defined contract with applications (e.g. SLAs). ● Give a serverless application for the product teams. Guiding principles for the data platform team... Innovative Impactful Reliable
  • 9. Unmet need for business metric observability Business metric observability What’s the health of the business? Grafana Operational observability What’s the health of the service? ● Is the service up? ● Is it throwing errors? ● In near real-time (< 1 min)
  • 10. Requirements for biz metric observability See results within 1 - 30 minutes Be the source of truthNear real-time Impact on business metrics Derive business metrics from raw data (aka ETL) Don’t widen the gap for reconciliation
  • 11. 11 Project F2 architecture Data Discovery app - Amundsen Operational Data stores (e.g. Dynamo) Apache Superset CDC Online flow Offline flow
  • 12. Magic of CDC - Change Data Capture Operational Data stores (e.g. Dynamo) Analytical Data stores (e.g. Hive/Presto, BQ) 1. Tail the operational Data stores 2. Persist the raw change log 3. Upsert the change log to table periodically (~30 m)
  • 13. Advantages of CDC Data Engineer Productivity See results within 30 minutes Near real-time Source of truth No need to reconcile Same data as operational DBs No need to recreate ETL from events Easier primitives to build ETL on top of
  • 14. ● Measuring reliability ○ How to distinguish late arriving data from missing data? ○ How do you trace a single missing revision through all moving parts? ● Lots of moving parts ○ Tailer, tied to implementation of operational DB ○ Ingest pipeline ○ Kafka, Kinesis ○ Analytic Database Challenges of the architecture
  • 15. CDC + Streaming = Lots of business value
  • 17. Data Science use cases - Driver app
  • 18. Data Science use cases - Pricing
  • 19. Requirements for streaming applications In Streaming, just like in Batch Quick and simple ways of cleaning data Prototype in a language of choice (Python, R, SQL) Quick and simple ways of cleaning data
  • 20. 20 Services (e.g. ETA, Pricing) Models + Applications (e.g. ETA, Pricing) Flyte Streaming architecture
  • 21. Investments in Streaming Dryft Fully managed data processing engine, powering real-time features and events - Needed for consistent feature generation - Batch processing for bulk creation of features for training - Stream processing for real-time creation of features for scoring - Uses Flink SQL under the hood Apache Beam Open source unified, portable and extensible model for both batch and streaming use-cases - Enables streaming use cases for teams using non JVM languages - Uses Flink under the hood
  • 22. ● Things we find at scale ○ Intermittent AWS service errors ○ Can’t be naive about pub-sub consumption ● Integration ○ Things work in isolation, but … ○ Flink Kinesis Connector ■ Connector that work at scale are hard Challenges of the architecture
  • 23. Sharing your batch and streaming compute will pay huge benefits
  • 25. 25 Data Platform architecture Data Discovery app - Amundsen Services (e.g. ETA, Pricing) Operational Data stores (e.g. Dynamo) Models + Applications (e.g. ETA, Pricing) Apache Superset BI/Data Viz Marketplace Operations app ... Other custom apps Custom apps Flyte
  • 26. Kafka is better but …. • Has limitations around fan-in Kafka vs. Kinesis Kinesis scaling limitations • We require high throughput & high fan-out • Default limit of 500 shards • Resharding is expensive and slow • Built a fan-out system to work around limitations
  • 27. ● Apache Flink vs. Apache Spark vs. Apache Beam ● 2 dimensions of comparison ○ APIs (the kinds of applications you can write) ○ Operations (the kind of applications you can support) ● Apache Beam for multi-language support (Python and Go) ● Spark Streaming - operations were hard, no state evolution, cumulative latencies with multi-stage graphs. ● Know when to put all your eggs in the same basket (Spark), when not to. Streaming engines
  • 28. Interactive querying: ● Redshift ○ Historical but dying ● Druid ○ Interactive use-cases ● Presto (on S3) ○ Super handy interactive query engine ○ Lacking real-time ingestion support ● BigQuery ○ Interactive query engine (like Presto) ○ Expensive, but great streaming support! ETL: ● Hive (on S3) ○ Mostly for ETL and adhoc queries that are too large to run on Presto ● Spark ○ Some ETL, potential for all ETL to be in Spark Data Storage and processing Future of Interactive querying Unified access layer e.g. DAL, Genie, DALi Views Future of ETL - Easily schedule with dependencies, a SQL query to be an ETL job - Diagnose job failures with lineage and dashboards on data skew, etc.
  • 29. ● Airflow ○ Most ETL jobs ○ Python heavy DAGs ○ Really good community to support ● Flyte ○ Focussed on ML workflows ○ Built in Provenance ○ Intermediate caching, discovery of previously computed artifacts Workflow engines
  • 31. ● We think about data as a platform and a competitive advantage. ● Our data and platform usage is growing really really fast. ● We support Data Science, Ops, Analytics, Experimentation and other use cases. ● We have seen tremendous benefit from CDC data + Streaming frameworks to deliver business metric observability. ● We have learned and gained a lot in operational excellence by sharing our batch and stream compute frameworks. ● We are investing in Data Discovery, Streaming, and Machine Learning. Conclusion
  • 32. Attend Streaming at Lyft session tomorrow!
  • 33. Attend Meetup at Level39 tonight!
  • 34. Thank you go.lyft.com/lyftdataplatformMay 2nd, 2019 Mark Grover | @mark_grover Deepak Tiwari | @_deepaktiwari_