5. Challenge of Big Data Stream
❖ High Throughput —— Million TPS
❖ IO BandWidth —— Network & Disk
❖ Storage Cost
6. Kafka Overview
❖ Open sourced in early 2011, graduate from the Apache
Incubator on 23 October 2012
❖ Log Aggregation, Messaging, Stream Processing, Event
Sourcing
❖ Widely used in BigData processing, integrate with
Storm, Spark, Flink, Samza, Hadoop, Flume, etc.
20. Structural Compression
❖ Assume each msg has N bytes, each batch has B msgs
❖ Size of 0.10: (34 + N) * B
❖ Size of 0.11: 61 + (7 + N) * B
❖ For N <= 100, save storage upper to 20%~50%
21. Content Map
❖ Challenge of Big Data Stream
❖ Kafka Overview
❖ Batch Through
❖ Compression Through
❖ Structural Compression
❖ BigData Eco-system