Learn the current state of the NoSQL landscape and discover the different data models within it. From document stores and key value databases to graph and Wide Column. Then you’ll learn why wide column databases are the most appropriate for scalable high performance use cases, including capabilities for massive scale-out architecture, peer-to-peer clustering to avoid bottlenecking and built-in multi-datacenter replication.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Survey of High Performance NoSQL Systems
1. High Performance NoSQL Masterclass
Survey of High Performance
NoSQL Systems
Peter Corless
2. High Performance NoSQL Masterclass
Peter Corless
● Director of Technical Advocacy,
ScyllaDB
● Editor / contributor to ScyllaDB blog
● Program chair for ScyllaDB Summit
and P99 CONF
● Host of ScyllaDB Masterclass series
● @PeterCorless on Twitter
5. NoSQL/Multimodel Databases in the Top 100
5
Key Value (9)
Redis
Memcached
Hazelcast
Etcd
Ehcache
Aerospike
Riak KV
RocksDB
LevelDB
Wide Column (8)
Apache Cassandra
Amazon DynamoDB
ScyllaDB
Apache HBase
DataStax Enterprise
Azure Table Storage
Google Cloud Bigtable
Accumulo
Document (12)
MongoDB
Couchbase
Firebase Realtime
CouchDB
Google Cloud Firestore
Realm
MarkLogic
Google Cloud Datastore
RavenDB
IBM Cloudant
RethinkDB
PouchDB
Graph (1)
Neo4j
Multimodel (5)
Azure Cosmos DB
ArangoDB
OrientDB
Oracle NoSQL
Yugabyte
Time Series (5)
InfluxDB
kdb+
Graphite
Prometheus
TimescaleDB [SQL]
6. High Performance NoSQL Masterclass
Document Databases
6
● “Documents” are encoded formats
○ Javascript Object Notation (JSON) or
Binary JSON (BSON)
○ Extensible Markup Language (XML)
○ (We’re not talking about managing
PDFs or Word files)
● Allows “tree”-style data models
● “Parent” and “child” nodes
ADVANTAGE
● Easy for developers to get started
DISADVANTAGE
● Primary-replica clustering bottlenecks
write-heavy workloads at scale
Discover more differences: MongoDB vs. ScyllaDB Production Experience from a Dev & Ops Standpoint
7. High Performance NoSQL Masterclass
Key Value Databases
7
● Keys are simple indexes for a record
● Values can be simple data types
(.e.g, text or integer values), or more
complex (lists, maps, collections)
● Often used for in-memory caching
ADVANTAGE
● Fast, simple
DISADVANTAGE
● Multi-datacenter clustering is an anti-
pattern
Why that might be a bad idea: 7 Reasons Not to Put an External Cache in Front of Your Database
8. High Performance NoSQL Masterclass
Graph Databases
8
● Models domains as vertices
(entities/objects) and edges
(relationships)
● “Edges” are vital for understanding
interrelationships
● Complexity grows as an n2 problem
● Query languages need to
understand how to navigate
topology (limit query depth, avoid
infinite loops, etc.) — Cypher,
Gremlin/Tinkerpop
ADVANTAGE
● Models object relational complexities
well
DISADVANTAGE
● Data set size often limited by
complexity / computational power
Did you know… You can use ScyllaDB or Cassandra as Storage Backend for JanusGraph?
9. High Performance NoSQL Masterclass
Wide Column Databases
9
● Row-based store
● “Key-key-value”
● Can be used as a simple key-value
● Many (but not all) share the SQL-like
Cassandra Query Language (CQL)
● Designed for horizontal scaleout
● ScyllaDB also architected for vertical
scale-up too.
ADVANTAGE
● Great scaleout, global clustering
DISADVANTAGE
● Intimidating to newcomers
11. High Performance NoSQL Masterclass
Horizontal (and Vertical) Scalability
11
● Scale out to any number of
nodes (Cassandra, ScyllaDB)
● Scale up to any number of
cores per node (ScyllaDB)
12. High Performance NoSQL Masterclass
Wide Column = “Key Key Value”
■ Wide column databases are row-based
● Use partitioning & clustering (or sort) keys
● Mostly used for transaction processing
(OLTP)
● Examples: Cassandra, ScyllaDB, DynamoDB
12
→
→
→
→
→
→
13. High Performance NoSQL Masterclass
Wide Column ≠ Column Store
■ Don’t confuse a wide column database with
a columnar database (aka column store)
■ Column stores store data in columnar format
● Can count “runs” of repeated values in
columns to minimize data repetition
● Mostly used for analytics processing
(OLAP)
● Examples: Druid, Pinot, Clickhouse,
BigQuery
13
14. High Performance NoSQL Masterclass
Automatic Data Sharding & Replication
14
Autosharding based on Token Ranges
Using an RF=3, each data record is automatically copied
and put on two other replica nodes
Servers
ScyllaDB
■ Data automatically partitioned and
balanced across cluster based on
partition key using token ranges
■ Data within partitions is organized
by clustering key (or sort key)
■ Each record is automatically
replicated across cluster based on
replication factor (typically RF=3) to
ensure durability
■ Multi-datacenter replication built-
in
0-100
0-100
0-100
101-200
101-200
101-200
201-300
201-300
201-300
15. High Performance NoSQL Masterclass
Leaderless Topology
15
Peer-to-Peer Active-Active (Multi-Datacenter)
Each node accepts reads+writes
Inherently better load balancing
Deals better w/ write-heavy or mixed read-write workloads
Clients
Servers
ScyllaDB
■ No single point of failure
■ No bottleneck at a “leader” node
■ Every node can be read-write
16. High Performance NoSQL Masterclass
Coordinator Node per Operation
■ Client makes request to any
replica node
■ This “coordinator” node forwards
the request to other replicas.
■ Replicas acknowledges operation
to coordinator, which responds to
client
■ Various forms of load balancing
● Simple round-robin
● Datacenter aware round-robin
● Heat-weighted load balancing
16
16
Coordinator Node
Using token awareness, for an update, the coordinator
node will be chosen from one of the current replicas
Clients
Servers
ScyllaDB
17. High Performance NoSQL Masterclass
Tunable Consistency Levels per Operation
■ “AP”-mode as per CAP theorem
● Emphasizes high availability
over strong consistency
■ Many consistency levels
● ONE
● QUORUM
● QUORUM_LOCAL
● EACH_QUORUM
● ALL
● ALL_LOCAL
17
Clients
Servers
ScyllaDB
Example: Quorum Consistency
In a cluster of 3 nodes, so long as 2 of the 3 nodes
succeed, the operation will succeed.
The third node will eventually get updated & be made
consistent, in-sync with the rest of the cluster
OK
OK NO
18. High Performance NoSQL Masterclass
Write & Read Paths
■ Writes are acknowledged when
both in in-memory memtable &
durable commitlog.
■ Periodically memtables are
flushed to immutable on-disk
Sorted Strings Tables (SSTables)
■ Reads will first check the in-
memory row-based cache, or
fetch data from SSTable on disk
■ Bloom filters help the system
figure out where the data is [or
isn’t] stored
18
19. High Performance NoSQL Masterclass
Discover More in ScyllaDB University
university.scylladb.com
20. High Performance NoSQL Masterclass
Keep in touch!
Peter Corless
Director of Technical Advocacy
ScyllaDB
peter@scylladb.com
@PeterCorless