Scaling Apache Pulsar to 10 Petabytes/Day

Brought to you by
Scaling Apache
Pulsar to 10 PB/day
Karthik Ramasamy
Senior Director of Engineering at

Karthik Ramasamy
Senior Director of Engineering
@karthikz
streaming @splunk | ex-CEO of @streamlio | co-creator of @heronstreaming | ex @Twitter | Ph.D

During the course of this presentation, we may make forward‐looking statements
regarding future events or plans of the company. We caution you that such statements
reﬂect our current expectations and estimates based on factors currently known to us
and that actual events or results may differ materially. The forward-looking statements
made in the this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, it may not contain current or
accurate information. We do not assume any obligation to update any forward‐looking
statements made herein.
In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes
only, and shall not be incorporated into any contract or other commitment. Splunk
undertakes no obligation either to develop the features or functionalities described or
to include any such feature or functionality in a future release.
Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the
United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020
Splunk Inc. All rights reserved
Forward-
Looking
Statements

1) Introduction to Splunk & DSP
2) Requirements, Use Cases & Deployment
3) Initial Cluster Size Estimation
4) Optimizations
5) Conclusion
Agenda

Splunk Data Stream Processor
Detect Data Patterns or Conditions
Mask Sensitive Data
Aggregate Format
Normalize Transform
Filter Enhance
Turn Raw Data Into
High-value Information
Protect Sensitive Data
Distribute Data To Splunk
Or Other Destinations
Data
Warehouse
Public
Cloud
Message
Bus
Splunk DSP
A real time stream processing solution that collects, processes and delivers data to Splunk and
other destinations in milliseconds

DSP - Bird’s Eye View
HEC
S2S
Batch
Apache Pulsar
Stream Processing
Engine
External
Systems
REST Client
Forwarders
Data Source
Splunk
Indexer
Apache Pulsar is at the core of DSP

Customer Requirements & Deployment

■ Marquee customer is in ﬁnance and payments
■ Microservices and applications emit logs
■ Logs contain rich information
■ Process these logs and extract monitoring & tracing information
■ Filter these logs depending on log volume and if there is high value - justifying
retention
■ Compute real time business metrics
Use Cases

■ Environment - Google Cloud Platform
■ Use of n1-standard-32 VMs
■ Raw data ingestion of 10 PB/day that translates ~120 GB/sec
■ Data retention of 3 hours
■ Need to handle the entire traﬃc load when a zone fails
Data Requirements

DSP Ingest Cluster
DSP Compute Cluster
DSP Compute Cluster
DSP Compute Cluster
Log
Publisher
Log
Publisher
Log
Publisher
Apache
Pulsar
Cluster
Pipeline 1
Pipeline 2
Pipeline 3
Splunk
Enterprise
Splunk
Observability
DSP Deployment

DSP Deployment
■ Separation of ingestion and computation
■ Pipeline isolation and no noisy neighbor issues
■ Troubleshooting single pipeline gets easier
■ Might not need over provisioning except for peak load + fudge
factor (as compared to deploying a single cluster)

■ 32 vCPUs
■ 120 GB of memory
■ Max number of PDs (EBS equivalent) - 128
■ Max total PD size - 257 TB
■ Max egress network bandwidth - 32 Gbps (4 GBps)
■ Max 24 L-SSDs for a total of 9 TB
VM Conﬁguration - n1-standard-32

P-SSD
P-HDD L-SSD
Storage Options in GCP

■ Replica factor of 3
■ Need to handle 120 GBps of raw traﬃc
■ Need to handle 360 GBps of storage write bandwidth
■ With journal required write bandwidth 720 GBps
■ Total storage required for retention - 3.9 PB
■ Total ingress network bandwidth - 480 GBps
■ Total egress network bandwidth - 1200 GBps
Apache Pulsar Requirements

■ Size of a Pulsar Cluster for a given workload depends on three
parameters -
■ Storage Density - Aggregate storage capacity needed in the
cluster and proportional to retention of data
■ Storage Bandwidth - Aggregate write throughput and read
throughput needed for data ingestion and consumption. Heavily
depends on storage media
■ Network Bandwidth - Aggregate network bandwidth available in
the cluster for input traﬃc and output traﬃc.
Pulsar Cluster Size Estimation

Max of 200 MB/sec write
throughput per VM
Max of 9 TB
per instance
Max of 4 GBps egress
and ingress bandwidth
Dominated by
Storage Bandwidth
Estimating VMs using P-HDD

throughput per VM
Max of 9 TB
per instance
Dominated by
Storage Bandwidth
Estimating VMs using P-SDD

throughput per VM
Max of 9 TB
per instance
Dominated by
Storage Bandwidth
Estimating VMs using L-SDD

Estimation of VMs - Comparison

Optimization #1 - Eliminating Journal
Since all the data is machine logs, we implemented replicated durability
■ Different types of durability
■ Persistent Durability - No data loss in the presence of nodes failures or
entire cluster failure
■ Replicated Durability - No data loss in the presence of limited nodes
failures
■ Transient Durability - Data loss in the presence of failures

Estimating VMs
Dominated by
Storage Bandwidth
Dominated by
Storage Bandwidth
Dominated by
Storage Density

Optimization #2 - Direct I/O
■ Overhead of page cache in container environment is pretty high
■ Kernel needs to keep track of the usage quota per container for the
page cache
■ These translate into maintaining additional data structures and
lookups (older kernel had n^2 lookup time for getting pages in & out)
■ Bypassed page cache for BookKeeper entry log, using JNI:
■ We already have in memory caches (write and read-ahead)
■ We have better control on what to cache and when to evict
■ Avoid double buffering

Estimating VMs
Dominated by
Storage Bandwidth
Dominated by
Storage Bandwidth Dominated by
Storage Density

Optimization #3 - Use of Compression
Bookie
Bookie
Bookie
Broker
Producer
Data
Data
Data
Compressed data
Consumer
Compressed data
C
U

Employing compression
■ Compression ratio of 4x
■ Need to handle 360 GBps —> 90 GBps of storage write bandwidth
■ Total storage required for retention - 3.9 PB —> 975 TB
■ Total ingress network bandwidth - 480 GBps —> 120 GBps
■ Total egress network bandwidth - 1200 GBps —> 240 GBps

Estimating VMs
Dominated by
Storage Density

Surviving Zone Failure
Segment 1
Segment 2
Segment n
.
.
.
Segment 2
Segment 3
Segment n
.
.
.
Segment 3
Segment 1
Segment n
.
.
.
Storage
Broker
Serving
Broker Broker
■ Zone/Rack Failures
■ Bookies provide rack awareness
■ Broker replicate data to different
racks/zones
■ In the presence of zone/rack failure,
data is available in other zones
■ One zone failure means two zones should
be capable of handling the entire traffic
■ Requires 50% additional VMs
Zone A Zone B Zone C

Optimization #4 - C++ Client CPU & Memory Usage
■ Better round robin across partitions - maximizing the batch
size per partition
■ Having bigger batches reduces the cpu usage for client, broker and
bookies
■ Increases the compression factor
■ Reduced client memory usages
■ Optimizations to minimize memory allocation overhead
■ Implemented memory limit in C++ producer
■ Simpliﬁes the user conﬁguration — One single setting instead of
multiple queue sizes and complex math

Finally …
Running 200 n1-standard-32 VMs for Pulsar cluster with 24 L-SSDs per VM

Brought to you by
@Karthikz
Karthik Ramasamy

Scaling Apache Pulsar to 10 Petabytes/Day

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Scaling Apache Pulsar to 10 Petabytes/Day

Semelhante a Scaling Apache Pulsar to 10 Petabytes/Day (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Scaling Apache Pulsar to 10 Petabytes/Day