O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Flink at netflix paypal speaker series

3.329 visualizações

Publicada em

* Over 100 million subscribers from over 190 countries enjoy the Netflix service. This leads to over a trillion events flowing through the Keystone stream processing infrastructure to help glean business insights and improve customer experience. The self-serve infrastructure enables the users to focus on extracting insights, and not worry about building out scalable infrastructure. I’ll share our experience building building this platform with Flink, and lessons learnt.

Publicada em: Dados e análise
  • Login to see the comments

Flink at netflix paypal speaker series

  1. 1. Stream Processing with Flink at Netflix Monal Daxini Stream Processing Lead 7/5/2018 @monaldax
  2. 2. • Worked on stream processing platform for business analytics for 4 years • Including vision roadmap, and implementation • Introduced Flink to Netflix 2 years ago, and drove adoption Profile
  3. 3. @monaldax Point & Click Keystone Pipeline Routing, Filtering, Projection At-least-once Streaming Jobs Analytics, Applications, Platforms How do we leverage Flink? Current Using Flink 1.4 Streaming SQL (Future)
  4. 4. @monaldax & Streaming Jobs (Platform Overview)
  5. 5. @monaldax Event Producers Sinks Ingest Pipelines Are The Backbone Of Real-time Data Infrastructure SERVERLESS Turnkey 100% in AWS Fully Managed
  6. 6. @monaldax Configure 1 Data Stream, A Filter, & 3 Sinks
  7. 7. @monaldax Chainable & Optional Filter & Projection
  8. 8. @monaldax Keystone Self-serve – Kafka Sink Partition Key Support
  9. 9. Event Producer Create 2 Kafka Topics, And Started Three Separate Jobs SPaaS Router Fronting Kafka KSGateway Consumer Kafka KCW Elasticsearch 3 Jobs1 Topic Keystone Management 1 Topic @monaldax
  10. 10. Traffic And Cost Per Stream
  11. 11. @monaldax Keystone Router Admin Tooling and Netflix Integration
  12. 12. Dashboard Generated For Provisioned Streams
  13. 13. Searchable Router Job Logs
  14. 14. Flink Job Web UI @monaldax k
  15. 15. Event Flow: Producer Uses Kafka Client Wrapper Or Proxy SPaaS Router Fronting Kafka Event Producer KSGateway Consumer Kafka Keystone Management KCW Elasticsearch @monaldax
  16. 16. Event Flow: Events Queued In Kafka SPaaS RouterFronting Kafka Event Producer KSGateway Consumer Kafka KCW Elasticsearch 3 instances Keystone Management @monaldax
  17. 17. Event Flow: Each Router Reads From Source, Optionally Applies Filter & Projection SPaaS RouterFronting Kafka Event Producer KSGateway Consumer Kafka KCW Elasticsearch 3 Jobs Keystone Management @monaldax
  18. 18. Event Flow: Each Router Writes To Their Respective Sinks SPaaS RouterFronting Kafka Event Producer KSGateway Consumer Kafka KCW Elasticsearch 3 Jobs Non-Keyed Keyed Supported Keystone Management @monaldax
  19. 19. Flink RouterFronting Kafka Event Producer Keystone Management KSGateway Consumer Kafka Stream Consumers HTTP / gRPC
  20. 20. Keystone Router Scale ● ̴2,000 routing jobs, ~̴10,000 containers ● 3000 m4.4xl ( 8 cores, 64MB, 2G network ) ● ̴3 trillion events processed /day, 600B - 1T uniques ○ Peak: 12M events in / sec & 36 GB in / sec
  21. 21. @monaldax Streaming Jobs
  22. 22. Event Producer Stream Processing Platform Router Fronting Kafka KSGateway Consumer Kafka Keystone Management KCW Elasticsearch Flink Streaming Job @monaldax
  23. 23. Generate Streaming Job From Template @monaldax
  24. 24. Generated Jenkins Build
  25. 25. Run And Debug Locally In The IDE @monaldax
  26. 26. Source: Stephan Ewen
  27. 27. (1) (2) (3) Developer Ergonomics To Work With Sources & Sinks (in red)
  28. 28. Deploying A Streaming Job In Test @monaldax
  29. 29. @monaldax Flink
  30. 30. @monaldax Anatomy Of a Flink Job
  31. 31. Router Flink Job - Standalone Cluster Mode SPaaS RouterFronting Kafka Event Producer KSGateway Consumer Kafka KCW Elasticsearch 3 instances Keystone Management @monaldax Flink Job
  32. 32. @monaldax Router Flink Job In HA Mode With 1 Job Manager Isolation - 1 Flink Clusters Runs One Routing Job Zookeeper Job Manager Leader (WebUI) Task Manager Task Manager Task Manager One dedicated Zookeeper cluster for all streamig Jobs
  33. 33. @monaldax Flink Job Cluster In HA Mode With 2 Job Managers Zookeeper Job Manager Leader (WebUI) Task Manager Task Manager Task Manager Job Manager (WebUI) One dedicated Zookeeper cluster for all streamig Jobs
  34. 34. Stream Processing Platform - Layered cake Amazon EC2 Titus Container Runtime Stream Processing Platform (Flink Streaming Engine, Config Management) Reusable Components Source & Sink Connectors, Filtering, Projection, etc. Routers (Streaming Job) Streaming Jobs ManagementService&UI Metrics&Monitoring StreamingJobDevelopment Dashboards @monaldax
  35. 35. @monaldax Types of Streaming Jobs
  36. 36. Broadly, Two Categories Of Streaming Jobs • Stateless • No state maintained across events (except Kafka offsets) • Stateful • State maintained across events @monaldax
  37. 37. Image adapted from: Stephan Ewen Stateless Stream Processor – No Internal State @monaldax
  38. 38. Stateless Stream Processor – External State Image adapted from: Stephan Ewen@monaldax
  39. 39. Keystone Routing Jobs ● Stateless (except Kafka offsets) ● Embarrassingly parallel SPaaS Router
  40. 40. @monaldax Stateless Streaming Job Use Case: High Level Architecture Enriching And Identifying Certain Plays Playback History Service Video Metadata Streaming Job Play Logs Live Service Lookup Data
  41. 41. Stateful Stream Processing Image adapted from: Stephan Ewen@monaldax
  42. 42. Search Sessionization – Custom Windowing On Out-of-order Events ...... S ES ……….Session 2: S Hours S E Session 1: SE … @monaldax
  43. 43. Streaming Application Flink Engine Local State Stateful Streaming Application With Local State, Checkpoints, And Savepoints Sinks Savepoints (Explicitly Triggered) Ext. Checkpoints (Automatic) Sources @monaldax
  44. 44. Checkpoint Ack With Metadata Job Manager ACK (metadata) S3 State Snapshot barriers
  45. 45. Checkpoint Metadata File After All Acks Job Manager S3 Checkpoint Metadata File
  46. 46. Flink State Backends Available ● Memory ● File system ● RocksDB (support incremental checkpoint)
  47. 47. State ● State sharding - Key Groups ○ Cannot change once set for a job ○ Total scale up limited by # of Key Groups ○ Lightweight - can have 1000s of Key Groups ● Scale up only from a savepoint, not from a checkpoint
  48. 48. @monaldax Why Flink?
  49. 49. ● Easy to use Data-flow programming model (VLDB, 2015) ○ Tools for reasoning about time - Event-time, watermarks, etc. ● Support for all messaging semantics including Exactly-once ● Exactly-once (semantics) fault tolerant state management (large state) ○ Enables Kappa Architecture ● Multiple connectors - sources and sinks Why Flink?
  50. 50. ● Multi stage jobs with exactly once processing semantics ○ Without the need for an external queue ○ Parallelism can be set independently for each operator in the DAG ● Backpressure support ● Support for Scala & Java ● Open source and active Community Why Flink?
  51. 51. @monaldax Scaling Flink Jobs
  52. 52. Scaling S3 Based Flink Checkpoint Store ● S3 automatically shards buckets internally with steady growth in RPS ● Lack of entropy in prefix path hinders scalability @monaldax
  53. 53. Adding Entropy @monaldax
  54. 54. Inject Dynamic Entropy In Checkpoint Path [FLINK-9061] (4-char random hex) state.backend.fs.checkpointdir.injectEntropy.enabled = true state.backend.fs.checkpointdir.injectEntropy.key =__ENTROPY_KEY__ state.backend.fs.checkpointdir = s3://b1/checkpoints/__ENTROPY_KEY__/b3ee-1527840560075/1439 State.backend.fs.memory-threshold = 1048576
  55. 55. Reducing Snapshots Uploaded To S3 For stateless, or jobs with very small state ● Avoid S3 writes from Task Manager, aggregate state in Job Manager state.backend.fs.memory-threshold = 1048576
  56. 56. Checkpoint Metadata File After All Acks Job Manager Amazon S3 2. Checkpoint Data & Metadata 1. Snapshot data & ACK @monaldax
  57. 57. High Frequency Checkpoints ● For checkpointing interval < 5sec, and large flink clusters, consider a different store than S3.
  58. 58. Too Many HEAD Requests, Despite Zero S3 Writes • Create CheckpointStreamFactory only once during operator initialization (FLINK-5800) • Fixed since 1.3
  59. 59. Fine Grained Recovery Job DAB With 3 Operators with parallelism 3 A1 A2 A3 B1 B2 B3 C1 C2 C3
  60. 60. Fine Grained Recovery - 1 Operator Instance Fails A1 A2 A3 B1 B2 B3 C1 C2 C3
  61. 61. Fine Grained Recovery - Restart Greyed Out Operator Instances A1 A2 A3 B1 B2 B3 C1 C2 C3
  62. 62. Fine Grained Recovery - Recovered A1 A2 A3 B1 B2 B3 C1 C2 C3
  63. 63. Life without fine grained recovery Each kill (every 10 minutes) caused ~2x spikes
  64. 64. Sometimes Revert To Full Restart Full restartFine grained recovery
  65. 65. Current implementation issue (FLINK-8042) ● Revert to full restart immediately if replacement container didn’t come back in time (FLINK-8042) ● Issue will be addressed in FLIP-6
  66. 66. Workaround For Flink-8042 - +1 Standby Container Zookeeper Job Manager Leader (WebUI) Task Manager Task Manager Task Manager Task Manager +1
  67. 67. Fine Grained Recovery With Standby Taskmanager
  68. 68. Challenges Of Stateful Job With Large State 1. Cannot avoid S3 writes from task managers a. Each task manager has large state 2. Fine grained recovery (+1 standby) does not help a. Connected job graph 3. Long recovery times
  69. 69. Stateful Jobs Often Have Data Shuffle Operation A1 A2 A3 B1 B2 B3 C1 C2 C3 keyBysource window sink S3 HDD HDD HDD
  70. 70. A1 A2 A3 B1 B2 B3 C1 C2 C3 One Task Failure Leads To Full Job Restart - Connected Graph S3 HDD HDD HDD
  71. 71. Copy Checkpointed Data To Local HDD From S3, Restart Job A1 A2 A3 B1 B2 B3 C1 C2 C3 TM #1 TM #2 TM #3 S3 HDD HDD HDD
  72. 72. With Task local recovery (FLINK-8360), Flink 1.5+ Snapshot loaded from Local HDD if available. A1 A2 A3 B1 B2 B3 C1 C2 C3 TM #1 TM #2 TM #3 S3 HDD HDD HDD
  73. 73. Explore: Snapshot Store And Recovery From EBS A1 A2 A3 B1 B2 B3 C1 C2 C3 TM #1 TM #2 TM #3 EBS EBS EBS
  74. 74. Explore: Replicate State To Other Task Managers, And Move Computation With Optional State Backup to S3 A1 A2 A3 B1 B2 B3 C1 C2 C3 TM #1 TM #2 TM #3 S3 HDD HDD HDD
  75. 75. Savepoint Metrics ● Size: 21 TBs ● Take time: 27 mins ● Recovery time: 6 mins Incremental checkpoint Metrics ● Checkpoint interval: 15 mins ● Size (avg): 950 GB ● Duration (avg): 2.5 minutes ● 200 Containers / 16 vCPUs, 54 GB memory, 108 GB SSD-backed EBS ● Parallelism 3200
  76. 76. ● ● ● ● ●
  77. 77. @monaldax Backfill / Rewind, Schema, SideInput
  78. 78. Flink Job Requires Reprocessing Option 1: Rewind To A Checkpoint In The Past With Kafka Source Time Checkpoint y Checkpoint x outage period Checkpoint x+1 Now
  79. 79. Option 1: Rewind To A Checkpoint In The Past With Kafka Source Time Checkpoint y Checkpoint x outage period Checkpoint x+1 Now
  80. 80. Option 2: Reprocess From Backfill source In Parallel TimeNow ->outage period
  81. 81. Stateless Job - Works! Stateful Job Challenge - Partial Ordered But Bounded Source and State Warm-up Timeoutage periodWarm-up period Emit outputNo output Partial-Ordered
  82. 82. Complex Case of Reprocessing Events What Should Job 2 do? Time outage period 1pm 4pm
  83. 83. For Reprocessing, We Are Leaning Towards Option 1 ● Messaging system with Tiered Storage ○ Kafka holds recent data (past 24 hours), the rest flows to a secondary storage like S3. Producers and Consumers would be able to use this seamlessly, OR ○ Use another product that fulfils this need.
  84. 84. Not a lambda architecture ● Single streaming code base ● Just switch source from Kafka to Hive
  85. 85. Misc • Hive Side Input for small reference tables • Structured Data Infrastructure (In Progress…) • Schema registration, validation and discovery infrastructure • Notifications of end-to-end breaking schema changesProviding • Schematized Keystone Pipeline Streams • Managed State for Materialized shuffle between two jobs
  86. 86. @monaldax Road Ahead
  87. 87. Road Ahead • Reprocess Data - Rewind or Backfill • Schema support • Few out of the box watermark Heuristics • Low SPaaS user impact Infrastructure upgrades • Auto scaling • Easy resource provisioning estimates • More Reusable blocks like Data hygiene • Rate Limiter for External Service Calls
  88. 88. @monaldax Q & A

×