SlideShare uma empresa Scribd logo
1 de 42
Real-Time Data Pipeline @ Uber
Mingmin Chen
George Teo
Seattle Apache Kafka Meetup
Jan 18, 2018
Agenda
● Use Cases & Current Scale
● Data Infrastructure @ Uber
● Kafka @ Uber
○ Rest Proxy & Clients
○ Local Agent
○ uReplicator (Mirrormaker)
○ Offset Sync Service
○ Chaperone (Auditing)
○ Cluster Balancing
● Future Work
Use Cases
Real-time Driver-Rider Matching
Stream
Processing
- Driver-Rider Match
- ETA
App Views
Vehicle information
KAFKA
UberEATS - Real-Time ETAs
A bunch more...
● Fraud Detection
● Share My ETA
● Driver & Rider Signups
● Etc.
Kafka - Use Cases
● General Pub-Sub
● Stream Processing
○ AthenaX - Self-Serve Platform (Samza, Flink)
● Database Changelog Transport
○ Schemaless, Cassandra, MySQL
● Ingestion
○ HDFS, S3
● Logging
Scale
* obligatory show-off slide
Trillion+ ~PBs
Messages/Day Data Volume
Scale
excluding replication
Tens of Thousands
Topics
Data Infrastructure @ Uber
Apache Kafka is Uber’s Data Hub
PRODUCERS
CONSUMERS
Real-time
Analytics, Alerts,
Dashboards
Samza / Flink
Applications
Data Science
Analytics
Reporting
Kafka
Vertica / Hive
Rider App
Driver App
API / Services
Etc.
Ad-hoc Exploration
ELK
Data Infrastructure @ Uber
Debugging
Hadoop
Surge Mobile App
Cassandra
Schemaless
MySQL
DATABASES
AWS S3
(Internal) Services
Kafka @ Uber
Requirements
● Scale Horizontally
● API Latency (<5ms typically)
● Availability -> 99.99%
● Durability -> 99.99%; 100% -> Critical Customers
● Multi-DC Replication
● Multi-Language Support
○ Java, Go, Python, Node.js, C++
● Auditing
Kafka Clusters
● Running Kafka 0.10.2
● Use Case-based
○ Logging
○ Database Changelogs
○ Highly Isolated & Reliable e.g. Surge
○ High Value Data (e.g. Signups)
● Fallback Secondary Clusters
● Global Aggregates
○ Offset Sync Service
DC2
DC1
Kafka Ecosystem @ Uber
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
16
Offset Sync Service
Aggregate
Kafka
uReplicator
DC1
DC2
Kafka Ecosystem @ Uber
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
17
Offset Sync Service
Aggregate
Kafka
uReplicator
Producer Libraries
● High Throughput (average case)
○ Non-blocking, async, batched
● At-least-once (critical use case)
○ Blocking, sync
● Topic Discovery
○ Discovers the kafka cluster a topic belongs
○ Able to multiplex to different kafka clusters
Kafka Local Agent
DC2
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
Offset Sync Service
Aggregate
Kafka
uReplicator
Kafka Local Agent
● Producer side persistence
○ Local storage
● Isolates clients from downstream outages, backpressure
● Controlled backfill upon recovery
○ Prevents from overwhelming a recovering cluster
Local Agent in Action
Add
Figure
Kafka Rest Proxy
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
22
Offset Sync Service
Why Kafka Rest Proxy ?
● Simplified Client API
○ Multi-lang Support
● Decouple Client With Kafka broker
○ Thin Clients = Operational Ease
○ Easier Kafka Upgrades
● Enhanced Reliability
○ Quota Management
○ Primary & Secondary Clusters
Kafka Rest Proxy: Internals
● Based on Confluent’s open sourced Rest Proxy
● Performance enhancements
○ Simple HTTP servlets on jetty instead of Jersey
○ Optimized for binary payloads.
○ Performance increase from 7K* to 45K QPS/box
● Caching of topic metadata
● Reliability improvements*
○ Support for Fallback cluster
○ Support for multiple producers (SLA-based segregation)
● Plan to contribute back to community
*Based on benchmarking & analysis done in Jun ’2015
Kafka Secondary Cluster
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
25
Offset Sync Service
Kafka Secondary Cluster
● High availability on regional cluster failure
● Rest proxy produces Secondary Cluster on Regional Cluster
failure
● uReplicator/Mirrormaker backfill data back to regional cluster
on recovery
uReplicator
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
Offset Sync Service
uReplicator
● In-house Intercluster Replication Solution
○ Apache Helix-based
○ Mirror all traffic between & within DCs
○ Lower rebalance latencies
● Running in Production ~2 Years
● Open Sourced: https://github.com/uber/uReplicator
● Uber Engineering Blog: https://eng.uber.com/ureplicator/
Cluster Balancing
● No Auto Rebalancing
● Manual Placement is Hard
● Auto Plan Generation
○ And execution!
Cluster Balancing
At-Least-Once
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1
2
3 5 7
64 8
Regional Kafka Aggregate Kafka
● Most of infrastructure tuned for high throughput
○ Batching at each stage
○ Ack before being persisted (ack’ed != committed)
● Single node failure in any stage leads to data loss
● Need a reliable pipeline for High Value Data e.g. Payments
At-least-once Kafka: Data Flow
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1
6
2 3 7
45 8
Regional Kafka Aggregate Kafka
Consumer
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Aggregate
Kafka
uReplicator
Consumer
Application
Consumer
Application
(Global View)
Offset Sync Service
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
Offset Sync Service
Offset Sync Service
● Used for syncing offset between aggregate clusters on
failover
● Mirrormaker periodically snapshot regional offset to
aggregate offset map to external datastore
● Use offset map to recover safe consumer offset to resume
from in passive DC
Auditing - Chaperone
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone - Track Counts
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone - Track Latency
Chaperone - End to End Auditing
● In-house Auditing Solution for Kafka
● Running in Production for ~2 Years
○ Audit 20k+ topics for 99.99% completeness
● Open Sourced: https://github.com/uber/chaperone
● Uber Engineering Blog: https://eng.uber.com/chaperone/
Future Work
Future Work
● Richer consumer semantics for service owners
○ DLQ
○ Per partition competing consumer
● Multi-zone Clusters
○ Durability during DC wide outages
● Chargebacks
● Efficiency Enhancements
○ Intelligent aggregates, automated topic GC etc..
● uReplicator 2.0
● Open Source
Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage
or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under
applicable law. All recipients of this document are notified that the information contained herein includes proprietary and
confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of
the enclosed information to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.
More open-source projects at eng.uber.com

Mais conteúdo relacionado

Mais procurados

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...Amazon Web Services Korea
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in KafkaJayesh Thakrar
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesRed Hat Developers
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
 

Mais procurados (20)

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
kafka
kafkakafka
kafka
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Integrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect FrameworkIntegrating Apache Kafka and Elastic Using the Connect Framework
Integrating Apache Kafka and Elastic Using the Connect Framework
 

Semelhante a Kafka Practices @ Uber - Seattle Apache Kafka meetup

How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterHostedbyConfluent
 
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...Data Con LA
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Koreaconfluent
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKafkaZone
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesKai Wähner
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless SolutionRyan ZhangCheng
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ NetflixIdo Shilon
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkconfluent
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaHotstar
 

Semelhante a Kafka Practices @ Uber - Seattle Apache Kafka meetup (20)

How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...
Data Con LA 2019 - Unifying streaming and message queue with Apache Kafka by ...
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
 
Openshift serverless Solution
Openshift serverless SolutionOpenshift serverless Solution
Openshift serverless Solution
 
BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 

Último

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Último (20)

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Kafka Practices @ Uber - Seattle Apache Kafka meetup

  • 1. Real-Time Data Pipeline @ Uber Mingmin Chen George Teo Seattle Apache Kafka Meetup Jan 18, 2018
  • 2. Agenda ● Use Cases & Current Scale ● Data Infrastructure @ Uber ● Kafka @ Uber ○ Rest Proxy & Clients ○ Local Agent ○ uReplicator (Mirrormaker) ○ Offset Sync Service ○ Chaperone (Auditing) ○ Cluster Balancing ● Future Work
  • 4. Real-time Driver-Rider Matching Stream Processing - Driver-Rider Match - ETA App Views Vehicle information KAFKA
  • 6. A bunch more... ● Fraud Detection ● Share My ETA ● Driver & Rider Signups ● Etc.
  • 7. Kafka - Use Cases ● General Pub-Sub ● Stream Processing ○ AthenaX - Self-Serve Platform (Samza, Flink) ● Database Changelog Transport ○ Schemaless, Cassandra, MySQL ● Ingestion ○ HDFS, S3 ● Logging
  • 9. Trillion+ ~PBs Messages/Day Data Volume Scale excluding replication Tens of Thousands Topics
  • 11. Apache Kafka is Uber’s Data Hub
  • 12. PRODUCERS CONSUMERS Real-time Analytics, Alerts, Dashboards Samza / Flink Applications Data Science Analytics Reporting Kafka Vertica / Hive Rider App Driver App API / Services Etc. Ad-hoc Exploration ELK Data Infrastructure @ Uber Debugging Hadoop Surge Mobile App Cassandra Schemaless MySQL DATABASES AWS S3 (Internal) Services
  • 14. Requirements ● Scale Horizontally ● API Latency (<5ms typically) ● Availability -> 99.99% ● Durability -> 99.99%; 100% -> Critical Customers ● Multi-DC Replication ● Multi-Language Support ○ Java, Go, Python, Node.js, C++ ● Auditing
  • 15. Kafka Clusters ● Running Kafka 0.10.2 ● Use Case-based ○ Logging ○ Database Changelogs ○ Highly Isolated & Reliable e.g. Surge ○ High Value Data (e.g. Signups) ● Fallback Secondary Clusters ● Global Aggregates ○ Offset Sync Service
  • 16. DC2 DC1 Kafka Ecosystem @ Uber Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka Aggregate Kafka uReplicator 16 Offset Sync Service Aggregate Kafka uReplicator
  • 17. DC1 DC2 Kafka Ecosystem @ Uber Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka Aggregate Kafka uReplicator 17 Offset Sync Service Aggregate Kafka uReplicator
  • 18. Producer Libraries ● High Throughput (average case) ○ Non-blocking, async, batched ● At-least-once (critical use case) ○ Blocking, sync ● Topic Discovery ○ Discovers the kafka cluster a topic belongs ○ Able to multiplex to different kafka clusters
  • 19. Kafka Local Agent DC2 DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka Aggregate Kafka uReplicator Offset Sync Service Aggregate Kafka uReplicator
  • 20. Kafka Local Agent ● Producer side persistence ○ Local storage ● Isolates clients from downstream outages, backpressure ● Controlled backfill upon recovery ○ Prevents from overwhelming a recovering cluster
  • 21. Local Agent in Action Add Figure
  • 22. Kafka Rest Proxy DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka DC2 Aggregate Kafka uReplicator Aggregate Kafka uReplicator 22 Offset Sync Service
  • 23. Why Kafka Rest Proxy ? ● Simplified Client API ○ Multi-lang Support ● Decouple Client With Kafka broker ○ Thin Clients = Operational Ease ○ Easier Kafka Upgrades ● Enhanced Reliability ○ Quota Management ○ Primary & Secondary Clusters
  • 24. Kafka Rest Proxy: Internals ● Based on Confluent’s open sourced Rest Proxy ● Performance enhancements ○ Simple HTTP servlets on jetty instead of Jersey ○ Optimized for binary payloads. ○ Performance increase from 7K* to 45K QPS/box ● Caching of topic metadata ● Reliability improvements* ○ Support for Fallback cluster ○ Support for multiple producers (SLA-based segregation) ● Plan to contribute back to community *Based on benchmarking & analysis done in Jun ’2015
  • 25. Kafka Secondary Cluster DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka DC2 Aggregate Kafka uReplicator Aggregate Kafka uReplicator 25 Offset Sync Service
  • 26. Kafka Secondary Cluster ● High availability on regional cluster failure ● Rest proxy produces Secondary Cluster on Regional Cluster failure ● uReplicator/Mirrormaker backfill data back to regional cluster on recovery
  • 28. uReplicator ● In-house Intercluster Replication Solution ○ Apache Helix-based ○ Mirror all traffic between & within DCs ○ Lower rebalance latencies ● Running in Production ~2 Years ● Open Sourced: https://github.com/uber/uReplicator ● Uber Engineering Blog: https://eng.uber.com/ureplicator/
  • 29. Cluster Balancing ● No Auto Rebalancing ● Manual Placement is Hard ● Auto Plan Generation ○ And execution!
  • 31. At-Least-Once Application Process ProxyClient Kafka Proxy Server uReplicator 1 2 3 5 7 64 8 Regional Kafka Aggregate Kafka ● Most of infrastructure tuned for high throughput ○ Batching at each stage ○ Ack before being persisted (ack’ed != committed) ● Single node failure in any stage leads to data loss ● Need a reliable pipeline for High Value Data e.g. Payments
  • 32. At-least-once Kafka: Data Flow Application Process ProxyClient Kafka Proxy Server uReplicator 1 6 2 3 7 45 8 Regional Kafka Aggregate Kafka
  • 34. Offset Sync Service DC1 Applications [ProxyClient] Kafka REST Proxy Regional Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Local Agent Secondary Kafka DC2 Aggregate Kafka uReplicator Aggregate Kafka uReplicator Offset Sync Service
  • 35. Offset Sync Service ● Used for syncing offset between aggregate clusters on failover ● Mirrormaker periodically snapshot regional offset to aggregate offset map to external datastore ● Use offset map to recover safe consumer offset to resume from in passive DC
  • 37. CONFIDENTIAL >> INSERT SCREENSHOT HERE << Chaperone - Track Counts
  • 38. CONFIDENTIAL >> INSERT SCREENSHOT HERE << Chaperone - Track Latency
  • 39. Chaperone - End to End Auditing ● In-house Auditing Solution for Kafka ● Running in Production for ~2 Years ○ Audit 20k+ topics for 99.99% completeness ● Open Sourced: https://github.com/uber/chaperone ● Uber Engineering Blog: https://eng.uber.com/chaperone/
  • 41. Future Work ● Richer consumer semantics for service owners ○ DLQ ○ Per partition competing consumer ● Multi-zone Clusters ○ Durability during DC wide outages ● Chargebacks ● Efficiency Enhancements ○ Intelligent aggregates, automated topic GC etc.. ● uReplicator 2.0 ● Open Source
  • 42. Thank you Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. More open-source projects at eng.uber.com

Notas do Editor

  1. Introductions
  2. [George]
  3. [George] Uber as a product is the realtime movement of people and things. As a result, Kafka (Stream processing) is a critical component of many real time systems at uber.
  4. [George] Rider app sends information to our servers, which is fed to Kafka. Driver app sends information to serves, which is fed to Kafka. This info is passed to stream processing framework, which does useful calculations. Then info is passed back to the user in the form of: Match Routing info ETA
  5. Promote Uber eats.... ETAs change based on timings. Need historical input on all trips i.e. submission time, preparation time, pickup time etc... More complex than rider app because there is an offline component.
  6. [George] Of course, this is just the tip of a very large iceberg
  7. [George] General pub sub between services Kafka is the basis of all Stream Processing systems at Uber. AthenaX (our self-serve platform) is built on top of Kafka. AthenaX uses Samza / Flink All data that needs to be ingested is written to Kafka. Changelog transport. Slightly different from the above use-cases because of ordering & durability guarantees Logging is used to feed ELK
  8. [George] We are one of the largest users of Kafka.
  9. [George] Excluding replication
  10. [George]
  11. [George]
  12. [George] Kafka is the hub in Uber’s data infrastructure. On the left side, we can find many kinds of applications and services. They generate data or logs and send them to Kafka. At the other side, we have stream processing engine, batch processing engines & various services to process the data. Now, let’s look a bit deeper in the Kafka box Highlight surge as an important use case to maintain marketplace health? For example, Surg Surge adjusts the prices based on demand/supply statistics, which is derived from data generated by rider and driver apps. ELK index log msgs for troubleshooting. Samza, Flink are general stream processing engines, used to find insight from the dataset in real time. While Hadoop represents the set of tools to process the data in batches. Meanwhile, data in Kafka are copied to HDFS and S3 for long term backup.
  13. [George]
  14. [George]
  15. [George] We are not using a single giant Kafka cluster in datacenter, since Kafka itself does not have good support for multi-tenancy and resource isolation. Instead, we have setup multiple clusters to support specific use cases. For example, We have dedicated cluster for Surge, which is super critical for Uber business. And we have a cluster for logging topics, which needs very high throughput. Besides, we have a secondary cluster in each data center, which accepts data from REST proxy if primary kafka goes down.
  16. [George] This is a high level overview of the Kafka architecture at Uber. Multiple DC Producer -> Rest Proxy -> DC Local Regional Cluster -> Mirrormaker/Ureplicator -> Agg Cluster (Global view of data)
  17. [George] Next half of presentation will cover some of the components we’ve added to scale Kafka at Uber: Producer Library/Local Agent [Mingmin] Rest Proxy [Mingmin] Secondary [Mingmin] Ureplicator [Mingmin] OffsetSyncService [George] Transition: Mingmin will discuss the producer side components.
  18. [Mingmin] Essentially, client libraries are HTTP clients. But we use many techniques inside to achieve high throughput and low produce latency Ilke, non-blocking/async and batching. Produce latency is how long it takes to call produce() and returns back from the method call. End2end latency is how long it takes for consumers to see the data. As mentioned, we have multiple Kafka clusters. Client library needs to discover which cluster the topic belongs to and sends msg there. What’s more, client library integrates with LocalAgent to ensure data reliability. We’re going to talk about this in following section.
  19. [Mingmin]
  20. [Mingmin] LocalAgent is deployed on every host. Has come in handy in production on several occasions. It’s been designed to use minimal resource, so that it won’t affect services on that host. When REST proxy fails, the data from client fail over to LocalAgent, which keeps data until RP goes back. And when RP is back, the backfilling rate is controlled to avoid overloading RP. Data stored on disk uses the Kafka ‘Log’
  21. [Mingmin]
  22. [Mingmin] And here we build this pipeline to address those requirements. Basically, in each data center, there is a regional Kafka cluster. In front of it, we setup Kafka REST proxy, which is web service essentially. Applications use proxy client to publish data to Kafka. At the other end, we have aggregate Kafka cluster. uReplicator copies data from multiple regional clusters into the aggregate cluster. Besides, LocalAgent and SecondaryKafka are used for fault tolerance purpose.
  23. [Mingmin] So why build it? Why not publish to Kafka directly? First of all, it simplifies the implementation of client library, Therefore, makes it feasible to support multiple language. Kafka protocol is not well documented and hard to implement. But with Rest Proxy, the client library is http client essentially. Secondly, it decouples client and kafka cluster. This makes Kafka maintenance easier to conduct and transparent to end users. What’s more, the connection to Kafka brokers are reduced a lot. Besides, we have built quota management in RestProxy to ensure abnormal producer won’t affect the normal ones.
  24. [Mingmin]
  25. [Mingmin] The regional clusters are just regular Kafka clusters, but we have a secondary cluster in DC, which guarantees HA when regional cluster is unavailable.
  26. [Mingmin]
  27. [Mingmin] uReplicator copies data from multiple regional clusters into the aggregate cluster. Replacement for the open source mirrormaker
  28. [Mingmin] Copies thousands of topics between clusters. Why did we build it? Long rebalance times. Upto 20 mins: Apache Helix lets us embed customized balancing logic in case certain works are heavily loaded
  29. [Mingmin]
  30. [Mingmin]
  31. [Mingmin] Most of our Kafka clusters are tuned for high throughput by batching and async techniques. By tuning the configuration and patching few parts of the pipeline, the data can be shipped over without any loss.
  32. [Mingmin]
  33. [George] Consumers may consume from two different places: Regional Kafka clusters Global Aggregate Cluster to see a global view of data
  34. [George]
  35. [George]
  36. [George] Chaperone is embedded in or deployed for all the components along the pipeline to count every message flow through it. The audit results are stored in Cassandra so that users can query them to check if there is msg loss or delay. In Chaperone, the different kind of components are called tiers, like Rest_proxy_tier or regional_tier, aggregate tier. The rest proxy and client libraries publish counts to the Chaperone Web Service Chaperone then consumes from the Kafka tiers and finally generates a report per-topic on the amount of data in each tier during a given 10 minute window If counts during a window differ by more 0.01% (i.e. 99.99% completeness), an alert is triggered
  37. [George] If there is no loss, msg count is supposed to be same at each tier. If there is loss, the gap in the figure highlights when the loss happened and by how much. ((For example, 10 msg are generated between 11:00am and 11:10am. When those 10 msg arrive at regional broker, an audit msg saying that 10 msg generated between this 10min has arrived at regional broker can be generated and stored in database. So, we can check if those 10 msg generated between this 10min has reached all components.))
  38. [George] Besides, Chaperone tracks msg latency and msg rate.
  39. [George]
  40. [George]
  41. [George]