This document discusses using Apache Cassandra to solve the problem of distributed data across hybrid cloud environments. It begins by describing the distributed data problem, including having applications and data in multiple geographic regions. It then discusses using Cassandra on-premise, in public clouds, and across multiple public clouds. The document provides an example of defining Cassandra keyspaces that replicate across data centers in different cloud regions. It concludes by discussing how to configure Cassandra to recognize multiple logical data centers spanning on-premise and cloud environments.
4. The Distributed Data Problem
4
● Many applications and services.
● Multiple geographic regions.
● Data needs to be:
● Close to the application
● Highly Available
● Consistent
Mobile App
Web App
...well, eventually
5. % whoami_
Aaron Ploetz
● Product Manager at DataStax
● MVP for Apache Cassandra (2014-2017)
● DB Engineer in retail space (former)
● Author:
5
6. How to solve?
Apache Cassandra
● Distributed Row Store
● Top-level Apache Project
● Current GA version 4.0.1
6
7. Why Apache Cassandra?
Features
● High Availability
● Data center awareness
● Nodes are peers
● Designed to handle failures
7
X
X
X
22. Keyspace definition
CREATE KEYSPACE pricing WITH REPLICATION=
{"class":"NetworkTopologyStrategy",
"us-west":"3","us-east":"3",
"milwaukee-dc":"3","omaha-dc":"3"};
Data center names here must match names in
cassandra-rackdc.properties!
22
23. Different ways to run Cassandra
● Apache Cassandra
● Free and open source
● Runs on commodity hardware
● K8ssandra
● Free and open source
● Cassandra designed to run on K8s
● DataStax Astra DB
● Serverless, cloud native
● No painful DevOps
23