Netflix Keystone Pipeline processing 600 billion events a day, and detailed treatise on the modification of and use of Samza for real time routing of events including docker.
25. Mind bender - Sink Isolation
● Multiple Samza jobs for one Kafka source topic
● Each job processes messages for one sink
○ E.g. separate job for each S3 & ElasticSearch cluster sinks
● Tradeoff
○ Sink isolation for extra load on Kafka source topic cluster
● Initial release
○ Each job processes partitions only from one topic
26. Samza Job Details
● Use window function to implement health check
○ task.window.ms=30000
● Batch requests to sinks
● Explicit offset commits only
○ automatic commits disabled - task.commit.ms=-1
31. 1 checkpoint topic per kafka cluster, sink, source topic
● Change the number of samza jobs for a topic
● Easily redistribute the partitions across jobs
● Add new partitions seamlessly
● Our naming scheme facilitates migrating topics to other clusters
32. Job Startup Delays reading Checkpoint
Causing health check failures - timeout 5 min
What to do?
34. Checkpoint topic Samza Job Configuration
Replication factor is hard coded to 3
task.checkpoint.system=checkpoint
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.segment.bytes=3145728
35. Add. Checkpoint Information
● About 300 bytes per offset commit
● Change log topic logs into the same checkpoint offset topic
○ Even if not enabled, one time large message with system-stream-partition
inserted into the same checkpoint offset topic
42. SAMZA-41 - static partition range assignment
job.systemstreampartition.matcher.class=
org.apache.samza.system.RegexSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=[8-10]
^8&|^9$|^10$
you need
44. Prefetch Buffer - When is it going to OOM?
● Default count based per Samza container
○ (50,000 / # partitions) per topic
○ systems.source.samza.fetch.threshold=50000
● Hard to get it right and avoid OOM
○ changing message size
45. SAMZA-775- size based Prefetch buffer
● How much of heap should I use for prefetching?
○ systems.source.samza.fetch.threshold.bytes=200000000 (200MB)
○ per system / stream / partition
○ if > 0 precedence over systems.source.samza.fetch.threshold
46. SAMZA-775- size based Prefetch buffer
● systems.source.samza.fetch.threshold.bytes is a Soft limit
○ bytes limit + size of last max message stream
● I don’t get it, where is the example?
47. SAMZA-775- size based Prefetch buffer
● systems.source.samza.fetch.threshold.bytes=100000 (100K)
● 50 SystemStreamPartitions
● per system-stream-partition threshold is (100000 / 2) / 50 = 1000 bytes.
● Enforced limit would be
○ 1000 bytes + size of last message from the partition
48. SAMZA-775- size based Prefetch buffer
● Value of systems.source.samza.fetch.threshold.bytes based on
○ Incoming traffic Bps into source Kafka
○ 60 seconds of buffer with region failover traffic
○ Samza in memory data structures (2 x message size)
49. SAMZA-775- size based Prefetch buffer
● How does it perform?
○ Per message overhead within 0.02% of computed heuristics in the patch
○ Actual footprint of systems.source.samza.fetch.threshold.bytes is 10-15% at
the most in worst case.
■ Example: If set to 200MB, worst case observed 230MB
50. SAMZA-775- size based Prefetch buffer
● Con
○ Implementation to enforce systems.source.samza.fetch.threshold.bytes is very
dependent on the implementation version of Samza
○ Hence, higher maintenance when code changes. However,
Well Worth It! Ergonmic Config! Adds Stability!
51. SAMZA-655 & SAMZA-540
● Backported from 0.10
○ environment variable configuration rewriter
■ Pass config from RDS to executor to Docker to Samza Job
○ expose latency related metrics in OffsetManager
■ checkpointed offset guage
65. End to End metrics
● Producer to Router latency
○ Avg. about 2.5 seconds
○ 90 percentile topics under 2 sec
● Kafka to Router consumer lag (estimated time to catch up)
○ 65 percentile under 500ms
○ 90 percentile under 5 seconds
● Producer event timestamp to Samza job router avg latency - 6 seconds
68. Wait there’s more in the pipeline...
● Self service tools
● Multi-tenant Stream Processing as a Service - SPaaS
○ probably add spark streaming to the mix
● Event traceability - on demand and sampled
● As number of jobs increase checkpoint topic may give way to Cassandra
● Optimization & Automation
72. Fronting Kafka Instances
● 2700 d2.xl AWS instances across 3 regions for regular & failover traffic
● d2.xl
○ Large disk (6TB) - 450-475MB/s of sequential I/O throughput
○ 30GB memory, 700 Mbps medium network capability
○ Replication lag above 18MB/second per broker with thousands of partitions
○ cons: multiple instances on same physical host - increases failures
73. Kafka Capacity Planning
1. Stay under 20k partitions per cluster (14K)
2. Leave ≅ 40% free disk space on each broker for growth & movement
3. Throughput per partition based on 1, 2, # of brokers, and the retention
period
74. Partition Assignment
● All assignments Zone / Rack aware
● Strategy 1 - Multiple of brokers
● Stategy 2 - Stateful Round Robin
75. Kafka Auditor as a Service
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing
● Built as a service deployable on single or multiple instances