Learn how to optimize your NoSQL database on AWS for cost, efficiency, and scale. NoSQL databases are great for modern datasets that require simplicity in design, handle structured and unstructured data, scale horizontally, and offer finer control over availability. With AWS, you have options for running NoSQL on Amazon EC2 with Amazon EBS or on Amazon DynamoDB. This webinar will dive deep into best practices and architectural considerations for designing and managing NoSQL databases like Cassandra, MongoDB, CouchDB, and Aerospike on EC2 and EBS. We will share best practices around instance and volume selection, provide performance tuning hints, and describe cost optimization techniques.
Learning Objectives:
• Learn about common NoSQL database options and use cases for Cassandra, MongoDB, CouchDB, and Aerospike
• Review best practices around architecting on AWS for different NoSQL databases
• Understand the cost vs. performance of different Amazon EC2 instances and Amazon EBS volumes
6. What is it?
• Dynamo model database +
CQL
• Horizontally scalable
• No single point of failure
• Data is immutable and stored
in collections
• JVM based
• Lot of management work is
done in a background
• Rely on gossip protocol
7. Main concerns of the customers
Schema & usage
pattern
Geo distribution Background
routines & specific
optimizations
9. Choosing instance & storage
capacity: 80% Writes
• For most of the workloads (especially
with 50/50 RW ratio) M4s with EBS is
the best option
• For write-heavy workloads with high
RPS requirements C4 with EBS
should be considered
• When the performance requirements
are high and the size of the dataset is
relatively small you can use I2s with
ephemeral storage
10. Choosing instance & storage
capacity: 80% Reads
• For most of the workloads M4s with
EBS is the good choice
• When the performance requirements
are high and the size of the dataset is
relatively small you can use I2s with
ephemeral storage
• When performance requirements are
high and dataset is large the best
option will be to use R4s with different
EBS flavors
11. FAQ: 2AZ cluster architecture
Hint: RetryPolicy for
Cassandra Driver
12. FAQ
Cassandra backup
/ restore
Auto Scaling of
Cassandra
clusters
Cassandra in
Containers
- Restore procedure for
the whole cluster can
be complicated
- Restore for single node
can be done with EBS
Snapshots
- Auto-scaling puts
unpredictable pressure
on the cluster
- Scaling up is simple,
but scaling down is
extremely complicated
- Makes sense only for
test / dev environments
15. What is it?
• Document-oriented
database
• Horizontally scalable
• HA is based on master /
slave replication
• Geo-distributed
• Lots of management
work is done in a
background
16. Main concerns of the customers
Schema & usage
pattern
Geo distribution and
performance
Data consistency &
partition tolerance
18. Choosing instance & storage
• MongoDB needs a lot of memory and
really fast disks so unless your
dataset is quite big the best option
will be either R3 or I2 (depending on
the size of the dataset)
• If the dataset is big you should
consider to use R4 with different EBS
flavors
• For hidden nodes you use M4 with
EBS as EBS snapshots would help
you to backup data easily
19. FAQ: 2AZ cluster architecture
Best option: Replica Set in one AZ and Hidden member in another one.
20. FAQ
MongoDB backup /
restore
Querying large
amount of data
MongoDB
consistency
- Hidden nodes with EBS
and EBS snapshots
backups
- Design schema
properly
- Avoid using
MapReduce on Master
- Lots of improvements
where done but there
are some edge cases
23. What is it?
• Document-oriented database built
on Dynamo model
• Supports RESTful API
• Eventual consistency
• Lockless optimistic with conflicts
resolution
• Horizontally scalable (with
constraints)
• Offline-first database
• Map reduce to prepare views
26. FAQ: 2AZ cluster architecture
• You should plan
replication schema on
your own so it is your
responsibility to check
how it will behave in
case of DR event
29. What is it?
• In-memory key-
value database
• High and constant
performance
• Sharing-nothing
architecture
• Geo-distributed
(hash partitions)
• Master-slave
replication
31. Choosing instance & storage
• Aerospike is used when
the performance
requirements are extreme.
It needs a lot of memory
and super fast disks. That
is why EC2 with
Ephemeral storage would
be a first choice for
Aerospike deployments.
32. FAQ: 2AZ cluster architecture
• If one AZ goes down
depending on you
replication factor you will
still have a copy of data
• Aerospike will be able to
add more nodes and
replicate data to it without
putting much pressure on
the existing nodes
• It takes time to replicate
data
33. FAQ
Aerospike backup
/ restore
Auto Scaling of
Aerospike clusters
Aerospike in
Containers
- Restore procedure for
the whole cluster can
be complicated
- Restore for single node
can be done with EBS
Snapshots
- Auto-scaling puts
unpredictable pressure
on the cluster
- Scaling up is simple,
but scaling down is
complicated
- Does not make any
sense
36. What is it?
• Graph database
• JVM based
• Provides REST API
• Two clustering modes: HA
cluster & Casual cluster
• Two types of nodes – Core
nodes & Read replicas
(RAFT protocol)
• Uses Cypher language for
querying
Neo4j Casual Clustering
39. FAQ: 2AZ cluster architecture
• If AZ fails and the
master node was in it –
new master election
procedure is initiated
• Core nodes in Casual
cluster mode vote by
simple majority
• If majority is
unavailable cluster
becomes read-only
43. Cost: Performance / Size
• If you want to be always cost
effective and efficient than
deployment is a journey for you
• Consider EBS as main option for
most of the workloads
• If your performance requirements
are really high and the size of the
dataset is relatively low – consider
EC2 with ephemerals, overvise –
go for EC2 with EBS
44. Sum up
• There is no general solution for all
cases
• Context matters and the solution
should follow the changing context
• Apps and code should be adapted to
the way NoSQL DBs work
• Initial choice of the deployment
options can be changed
• Best way to make initial choice of the
deployment – PoC