Fallout is an open source testing framework based on Jepsen. In this talk we will see how distributed testing works and how to use these tools to verify Pulsar quality. We will see how we can easily deploy a reproducible Pulsar cluster on K8S and how to use ChaosMesh to inject failures. We will also cover integrated metrics reporting tools, very useful to verify the behaviour of the system for any Pulsar version, system environment and especially during maintenance operations (rollout restarts/upgrades) and unexpected failures.
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
1. Distributed Tests on Pulsar with
Fallout
Enrico Olivelli
DataStax - Luna Streaming Team
Apache Pulsar Committer
Member of Apache BookKeeper and Apache ZooKeeper PMC
Apache Curator VP
2. Agenda
● Testing Distributed Messaging Systems
● Introduction to Fallout
● Fallout architecture
● NoSQLBench
● Anatomy of a Fallout test
● Fallout and Pulsar
● Live Demo
● Future works
2
3. Testing Distributed Messaging Systems
3
Classic types of tests: Unit tests, Integration tests, System tests
Distributed System Tests:
● Launch N machines (or clusters!)
● Deploy applications (Helm, Unzip tarballs…)
● Run clients
● Inject failures
● Perform system wide assertions
● Create reports (performance, failures…)
● Compare reports (regression tests)
4. Fallout
4
OpenSource project (ASLv2 licensed), created at DataStax
https://github.com/datastax/fallout
Initially started for Apache Cassandra®, now it is a general purpose tool.
A layer on top of Jepsen.io https://jepsen.io/
Design Repeatable Experiments with real clusters:
- Declarative language (YAML)
- Deterministic Setup-Run-Check loop
- Supports k8s natively (helm, kubectl, k8s jobs...)
- Integrated with GKE
- Monitor longevity tests
- Aggregate logs from all nodes/pods
5. NoSQLBench
5
OpenSource project (ASLv2 licensed)
https://github.com/nosqlbench/nosqlbench
Allows you to exercise your system:
- Load generator
- Performance measurement
- Distributed execution
Supports many drivers:
- Apache Cassandra, MongoDB...
- JDBC
- Generic HTTP based services
- Messaging: Kafka, Pulsar, JMS 2.0
For every driver it tracks basic metrics: throughput/latency but also driver specific metrics (like
transaction commit time for Pulsar)
Integrated with Dropwizard metrics and with Graphite
6. Fallout Architecture
6
Key components:
- Provisioners: where to run the test
- Configuration Managers: what to run
- Providers: access to the services and information
Workload:
- Modules: actions
- Phases: execution model: concurrent, sequential
- Artifact checkers: summarize metrics, produce charts, verify logs
7. Anatomy of a Fallout test - Provisioner and ConfigurationManager
7
# Parameters
image:
name: datastax/pulsar
version: 2.6.2_1.0.0
...........
---
ensemble:
server:
node.count: {{cluster.numNodes}}
provisioner:
name: gke
configuration_manager:
- name: helm
properties:
helm.install.values.file: <<file:pulsar-values.yaml>>
helm.repo.name: {{helmchart.reponame}}
8. Anatomy of a Fallout test - Workload and Checkers
8
workload:
phases:
- create-topic:
module: kubernetes_job
properties:
manifest: <<file:createtopic.yaml>>
- produce_messages:
module: nosqlbench
properties:
cycles: {{producer.nummessages}}
consume_messages:
module: nosqlbench
properties:
checkers:
nofail:
checker: nofail
artifact_checkers:
generate_chart:
artifact_checker: hdrtool
9. Testing Pulsar with Fallout
9
Release stability validation:
- Test cluster wide features, in k8s environments (like k8s functions)
- Longevity tests
- Simulate failures: BookKeeper, ZooKeeper, Broker, Proxy
- Simulate rollout restarts
- Simulate upgrades
Benchmarks:
- Hunt for performance regressions, running tests against current ‘master’ branch
- Compare different releases (Apache Pulsar, Luna Streaming …)
- Measure a given setup (configuration + cluster size + machines), in a reproducible way
- Reproduce performance issues
10. Simulating Bookie failure with ChaosMesh
10
Sample scenario:
- Start a 6 nodes cluster on GKE
- Deploy Apache Pulsar 2.7.2 using Helm
- 1 broker
- 3 bookies
- 1 proxy
- Replication parameters: 2-2-2 (2 copies)
- Deploy a NoSQLBench pod
- Deploy ChaosMesh (using Helm)
- Create a partitioned topic
- Produce and Consume messages
- Simulate Bookie pod failure (one bookie at a time)
- Track time series for latency
- No error must be reported by Producers and Consumers
Template: https://github.com/datastax/pulsar-fallout/blob/master/benchmarks/template.yaml
11. Simulating Bookie failure with ChaosMesh
11
Live demo
Key points:
- Fallout UI
- Template
- Parameters
- System wide log aggregation
- Verify Bookie failures in the logs of the Broker
- Show latency generated graph
12. Pulsar Release Validation toolkit
12
Repository with sample files for basic release validation and NoSQLBench based testing:
https://github.com/datastax/pulsar-fallout
Examples for:
- Deploy Pulsar, from 2.7.0 up to your custom docker image
- Using Apache Pulsar Helm Chart and Luna Streaming Helm Chart
- Running NoSQLBench
- Using ChaosMesh for failure injection
- Creating custom configurations of Pulsar
- Run client tools (pulsar-perf, pulsar-client, pulsar-admin)
13. Future works
13
At DataStax we are already using Fallout for Apache Cassandra and Apache Pulsar.
Useful follow ups for the community:
- Contribute the corpus of tests to the Apache repo
- Give to the community an easy to test Apache Pulsar with real distributed system tests
- Integrate Fallout based validation for pre-release validation or PR validation
- Use Fallout Docker images to run tests on GitHub actions
Fallout and NoSQLBench are public Open Source projects, everyone can contribute and enhance
these powerful tools
14. Wrapping up
14
Fallout:
- Distributed system tests are hard to design and to deploy
- Testing manually a complex project is error-prone
- Fallout is a brand new framework to easily write distributed system tests
- Reproducible
- Easy to use (YAML based, declarative style)
- NoSQLBench is the perfect companion for Fallout (but you are not required to use it)
Filling in the gaps in Pulsar testing:
- Systematically test and verify performances
- Ensure that Pulsar runs well on real world clusters (k8s as first class citizen)
- Be able to reproduce real world workloads in lab
15. References
15
LinkedIn - https://www.linkedin.com/in/enrico-olivelli-984b7874/
Twitter: @eolivelli
Apache Pulsar Community: http://pulsar.apache.org/en/contact/ (Slack, ML…)
References:
Fallout - https://github.com/datastax/fallout
Pulsar Templates - https://github.com/datastax/pulsar-fallout
NoSQLBench - https://github.com/nosqlbench/noslqbench
Great tutorial about Fallout - https://www.youtube.com/watch?v=45iTmTBjU0M DataStax Fallout -
Testing Scaleable Distributed System with Sean McCarthy