"Kafka often finds itself as the backbone of a company’s systems, but the failure modes and signals leading to those failures are not always well understood. Chaos Engineering espouses empiricism, experimentation over testing, and verification over validation. We can prime a Kafka cluster as a Chaos Experiment by putting it under a controlled load test called a ‘squeeze test’. This session gives attendees confidence in the steps needed to build experiments to prove Kafka cluster(s) can fulfill the needs of the business. We start by demonstrating how to build a “steady state” hypothesis based on cluster sizing, best practices and expected usage, monitoring configuration, and perfunctory performance testing. We then develop an example hypothesis that as the load on the Kafka cluster increases towards the tipping point we will receive monitoring alerts/signals for key metrics. Attendees learn in detail how real world events were varied for the experiment, including design goals, hard trade-offs, and safety mechanisms necessary for the load tool to adhere to Chaos Engineering principles. We show how the results were analyzed to support or debunk the hypothesis. Finally, we lay out the next steps for attendees’ Chaos Engineering journey."