In Cassandra lunch #107, Dipan Shah will discuss how Guardrails works in Apache Cassandra.
Accompanying Blog: Coming Soon!
Accompanying YouTube: https://youtu.be/DEVKqJeKfSw
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Heart Disease Classification Report: A Data Analysis Project
Apache Cassandra Lunch #107: Guardrails
1. Version 1.0
Guardrails - Apache Cassandra
In Cassandra lunch #107, we will discuss how Guardrails
work in Apache Cassandra
Dipan Shah
Engineer @ Anant
2. Topics
● What are guardrails?
● What is the purpose of guardrails?
● How they work?
● Guardrails options for production
● Guardrails demo
● Upcoming developments
● Q&A
3. Guardrails in Apache Cassandra
● A framework that allows operators to restrict certain functionalities in Cassandra
● Available from Apache Cassandra V 4.1
● Has been available in Datastax Enterprise from V 6.8 in some form
4. What problems does it solve?
● Operators have faced cluster stability problems due to improper usage of Cassandra
● Best practices and Anti-patterns are difficult to communicate and enforce
● Guardrails allow to restrict some of these functionalities
● Options like below have been available since earlier versions but this is different
○ Tombstone limits
○ Batch limits
○ Materialized views usage
5. How it works?
The new framework allows operators to restrict how Cassandra is used by:
● Disabling certain features
● Disallowing some specific values
● Defining soft and hard limits to certain database magnitudes
10. Monitoring Guardrail events
● The triggering of a guardrail will emit a diagnostic log with guardrail event in it
● Check logs for WARN and ERROR messages related to guardrails
● Set alerts for such messages in log aggregation tools like Splunk, ELK stack, Graylog,
etc.
11. More Guardrail examples
● Secondary Indexes
○ secondary_indexes_enabled: true
○ secondary_indexes_per_table_warn_threshold: 5
○ secondary_indexes_per_table_fail_threshold: 10
● Number of fields in a UDT
○ fields_per_udt_warn_threshold: -1
○ fields_per_udt_fail_threshold: -1
12. Background Guardrails
● Some guardrails are checked in the background
● They are not associated with any specific query
● To avoid a costly read-before-write operation
● Examples:
○ Disk space usage
○ Number of items in a non-frozen collection
13. Additional Guardrails for Production
● Replication factor
○ minimum_replication_factor_warn_threshold
○ minimum_replication_factor_fail_threshold
● Read and write consistency levels
○ read_consistency_levels_warned: []
○ read_consistency_levels_disallowed: []
○ write_consistency_levels_warned: []
○ write_consistency_levels_disallowed: []
14. Additional Guardrails for Production
● IN restrictions
○ partition_keys_in_select_warn_threshold
○ partition_keys_in_select_fail_threshold
● Materialized views
○ materialized_views_per_table_warn_threshold
○ materialized_views_per_table_fail_threshold
15. Exceptions
● Guardrails are only applied to the operations of regular users
● They will neither be checked for superuser queries nor internal queries
● The configuration for guardrails is an extensible API
● Third-party alternative implementations could provide different guardrail
configurations depending on the user, or on some other factors
17. Upcoming developments
● New features are being developed
● Can be tracked at: https://issues.apache.org/jira/browse/CASSANDRA-
17189?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20%22guard
rail%22
19. Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037
Editor's Notes
https://cassandra.apache.org/_/blog/Apache-Cassandra-4.1-Features-Guardrails-Framework.html
For example, on the schema side, users can create too many tables or secondary indexes, leading to excessive use of resources. On the query side, users can run queries touching too many partitions that might involve all nodes in the cluster. Even worse, they can simply run a query using costly replica-side filtering, potentially reading all the table contents into memory on all nodes across the cluster.
https://docs.datastax.com/en/dse/6.8/dse-dev/datastax_enterprise/config/configCassandra_yaml.html#configCassandra_yaml__guardrailsYaml