SlideShare a Scribd company logo
1 of 27
Exhibitor
Netflix’s ZooKeeper Management System




                           Jordan Zimmerman
                                   Senior Platform Engineer
                                                 Netflix, Inc.
                                 jzimmerman@netflix.com
                                                  @rangalt
The Problem
• ZooKeeper is statically configured
• Limited tools for managing the ensemble
• Backup/restore is sometimes needed
• Visualization is desperately needed
• Prior to 3.4.x, periodic cleanup needed
The Goal
Chaos Monkey-able
• See http://techblog.netflix.com/2011/07/
   netflix-simian-army.html
 A tool that randomly disables our production instances to make sure we can
 survive common types of failure without any customer impact.


• Completely unmanned
• Bringing up a new ensemble should be
   turn-key/push-button
Features
Instance Monitoring
Each Exhibitor instance monitors the
ZooKeeper server running on the same
server. If ZooKeeper is not running, Exhibitor
will write the zoo.cfg file, etc. and start it. If
ZooKeeper crashes for some reason,
Exhibitor will restart it.
Log Cleanup
In versions prior to ZooKeeper 3.4.x, log file
maintenance is necessary. Exhibitor will
periodically do this maintenance.
Backup/Restore
Backups in a ZooKeeper ensemble are more complicated
than for a traditional data store (e.g. aRDBMS). Generally,
most of the data in ZooKeeper is ephemeral. It would be
harmful to blindly restore an entire ZooKeeper data set.
What is needed is selective restoration to prevent accidental
damage to a subset of the data set. Exhibitor enables this.
Exhibitor will periodically backup the ZooKeeper transaction
files. Once backed up, you can index any of these transaction
files. Once indexed, you can search for individual transactions
and “replay” them to restore a given ZNode to ZooKeeper.
Cluster-wide
        Configuration
Exhibitor presents a single console for your
entire ZooKeeper ensemble. Configuration
changes made in Exhibitor will be applied to
the entire ensemble.
Rolling Ensemble
          Changes
Exhibitor can update the servers in the
ensemble in a rolling fashion so that the
ZooKeeper ensemble can stay up and in
quorum while the changes are being made.
Visualizer
Exhibitor provides a graphical tree view of
the ZooKeeper ZNode hierarchy.
ZooKeeper Data
       Mutation
When enabled, Exhibitor can create/update/
delete nodes in the ZooKeeper hierarchy.
Curator Integration
Exhibitor and Curator (Cur/Ex!) can be
configured to work together so that Curator
instances are updated for changes in the
ensemble.
           Exhibitor             Exhibitor             Exhibitor
              A                     B                     ...




           Round Robin - periodic query for servers list




                       Curator Clients
                          Curator Clients
                              Curator Clients
How it Works
Shared Configuration
                             • S3
                             • File System
                 Shared      • Etc.
                 Config




     Exhibitor   Exhibitor         Exhibitor
        A           B                 ...
Coming Soon...
• Auto-register new instances
• Auto-remove old instances
• Alerting
• ???
Using / Integration
• Stand alone application
                   - or -


• Library/JAR
REST API
• https://github.com/Netflix/exhibitor/wiki/
  REST-Introduction
Demos

Q&A

More Related Content

What's hot

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 

What's hot (20)

Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...
Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...
Kafka Streams Rebalances and Assignments: The Whole Story with Alieh Saeedi &...
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
 
Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...Managing multiple event types in a single topic with Schema Registry | Bill B...
Managing multiple event types in a single topic with Schema Registry | Bill B...
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Apache flink
Apache flinkApache flink
Apache flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 

Viewers also liked (6)

Curator intro
Curator introCurator intro
Curator intro
 
Apache Curator: Past, Present and Future
Apache Curator: Past, Present and FutureApache Curator: Past, Present and Future
Apache Curator: Past, Present and Future
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Semaphores
SemaphoresSemaphores
Semaphores
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Similar to Exhibitor Introduction

ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name DenaliITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
Lucidworks
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
CIVEL Benoit
 

Similar to Exhibitor Introduction (20)

So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
 
Eclipse e4
Eclipse e4Eclipse e4
Eclipse e4
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name DenaliITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java bin
 
Store Beyond Glorp
Store Beyond GlorpStore Beyond Glorp
Store Beyond Glorp
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Multi-threading in the modern era: Vertx Akka and Quasar
Multi-threading in the modern era: Vertx Akka and QuasarMulti-threading in the modern era: Vertx Akka and Quasar
Multi-threading in the modern era: Vertx Akka and Quasar
 
Redux Tech Talk
Redux Tech TalkRedux Tech Talk
Redux Tech Talk
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Exhibitor Introduction

  • 1. Exhibitor Netflix’s ZooKeeper Management System Jordan Zimmerman Senior Platform Engineer Netflix, Inc. jzimmerman@netflix.com @rangalt
  • 3. • ZooKeeper is statically configured • Limited tools for managing the ensemble • Backup/restore is sometimes needed • Visualization is desperately needed • Prior to 3.4.x, periodic cleanup needed
  • 5. Chaos Monkey-able • See http://techblog.netflix.com/2011/07/ netflix-simian-army.html A tool that randomly disables our production instances to make sure we can survive common types of failure without any customer impact. • Completely unmanned • Bringing up a new ensemble should be turn-key/push-button
  • 7. Instance Monitoring Each Exhibitor instance monitors the ZooKeeper server running on the same server. If ZooKeeper is not running, Exhibitor will write the zoo.cfg file, etc. and start it. If ZooKeeper crashes for some reason, Exhibitor will restart it.
  • 8.
  • 9. Log Cleanup In versions prior to ZooKeeper 3.4.x, log file maintenance is necessary. Exhibitor will periodically do this maintenance.
  • 10. Backup/Restore Backups in a ZooKeeper ensemble are more complicated than for a traditional data store (e.g. aRDBMS). Generally, most of the data in ZooKeeper is ephemeral. It would be harmful to blindly restore an entire ZooKeeper data set. What is needed is selective restoration to prevent accidental damage to a subset of the data set. Exhibitor enables this. Exhibitor will periodically backup the ZooKeeper transaction files. Once backed up, you can index any of these transaction files. Once indexed, you can search for individual transactions and “replay” them to restore a given ZNode to ZooKeeper.
  • 11. Cluster-wide Configuration Exhibitor presents a single console for your entire ZooKeeper ensemble. Configuration changes made in Exhibitor will be applied to the entire ensemble.
  • 12.
  • 13. Rolling Ensemble Changes Exhibitor can update the servers in the ensemble in a rolling fashion so that the ZooKeeper ensemble can stay up and in quorum while the changes are being made.
  • 14. Visualizer Exhibitor provides a graphical tree view of the ZooKeeper ZNode hierarchy.
  • 15.
  • 16. ZooKeeper Data Mutation When enabled, Exhibitor can create/update/ delete nodes in the ZooKeeper hierarchy.
  • 17.
  • 18. Curator Integration Exhibitor and Curator (Cur/Ex!) can be configured to work together so that Curator instances are updated for changes in the ensemble. Exhibitor Exhibitor Exhibitor A B ... Round Robin - periodic query for servers list Curator Clients Curator Clients Curator Clients
  • 20. Shared Configuration • S3 • File System Shared • Etc. Config Exhibitor Exhibitor Exhibitor A B ...
  • 22. • Auto-register new instances • Auto-remove old instances • Alerting • ???
  • 24. • Stand alone application - or - • Library/JAR

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n