At Databricks, we manage Apache Spark clusters for customers to run various production workloads. In this talk, we share our experiences in building a real-time monitoring system for thousands of Spark nodes, including the lessons we learned and the value we’ve seen from our efforts so far. The was part of the talk presented at #monitorSF Meetup held at Databricks HQ in SF.