Survey of High Performance NoSQL Systems

High Performance NoSQL Masterclass
Survey of High Performance
NoSQL Systems
Peter Corless

Peter Corless
● Director of Technical Advocacy,
ScyllaDB
● Editor / contributor to ScyllaDB blog
● Program chair for ScyllaDB Summit
and P99 CONF
● Host of ScyllaDB Masterclass series
● @PeterCorless on Twitter

NoSQL Database
Landscape

DB-Engines.com “Top 100”
4
As of November 2022

NoSQL/Multimodel Databases in the Top 100
5
Key Value (9)
Redis
Memcached
Hazelcast
Etcd
Ehcache
Aerospike
Riak KV
RocksDB
LevelDB
Wide Column (8)
Apache Cassandra
Amazon DynamoDB
ScyllaDB
Apache HBase
DataStax Enterprise
Azure Table Storage
Google Cloud Bigtable
Accumulo
Document (12)
MongoDB
Couchbase
Firebase Realtime
CouchDB
Google Cloud Firestore
Realm
MarkLogic
Google Cloud Datastore
RavenDB
IBM Cloudant
RethinkDB
PouchDB
Graph (1)
Neo4j
Multimodel (5)
Azure Cosmos DB
ArangoDB
OrientDB
Oracle NoSQL
Yugabyte
Time Series (5)
InfluxDB
kdb+
Graphite
Prometheus
TimescaleDB [SQL]

Document Databases
6
● “Documents” are encoded formats
○ Javascript Object Notation (JSON) or
Binary JSON (BSON)
○ Extensible Markup Language (XML)
○ (We’re not talking about managing
PDFs or Word files)
● Allows “tree”-style data models
● “Parent” and “child” nodes
ADVANTAGE
● Easy for developers to get started
DISADVANTAGE
● Primary-replica clustering bottlenecks
write-heavy workloads at scale
Discover more differences: MongoDB vs. ScyllaDB Production Experience from a Dev & Ops Standpoint

Key Value Databases
7
● Keys are simple indexes for a record
● Values can be simple data types
(.e.g, text or integer values), or more
complex (lists, maps, collections)
● Often used for in-memory caching
ADVANTAGE
● Fast, simple
DISADVANTAGE
● Multi-datacenter clustering is an anti-
pattern
Why that might be a bad idea: 7 Reasons Not to Put an External Cache in Front of Your Database

Graph Databases
8
● Models domains as vertices
(entities/objects) and edges
(relationships)
● “Edges” are vital for understanding
interrelationships
● Complexity grows as an n2 problem
● Query languages need to
understand how to navigate
topology (limit query depth, avoid
infinite loops, etc.) — Cypher,
Gremlin/Tinkerpop
ADVANTAGE
● Models object relational complexities
well
DISADVANTAGE
● Data set size often limited by
complexity / computational power
Did you know… You can use ScyllaDB or Cassandra as Storage Backend for JanusGraph?

Wide Column Databases
9
● Row-based store
● “Key-key-value”
● Can be used as a simple key-value
● Many (but not all) share the SQL-like
Cassandra Query Language (CQL)
● Designed for horizontal scaleout
● ScyllaDB also architected for vertical
scale-up too.
ADVANTAGE
● Great scaleout, global clustering
DISADVANTAGE
● Intimidating to newcomers

The Case for Wide
Column NoSQL
10

Horizontal (and Vertical) Scalability
11
● Scale out to any number of
nodes (Cassandra, ScyllaDB)
● Scale up to any number of
cores per node (ScyllaDB)

Wide Column = “Key Key Value”
■ Wide column databases are row-based
● Use partitioning & clustering (or sort) keys
● Mostly used for transaction processing
(OLTP)
● Examples: Cassandra, ScyllaDB, DynamoDB
12
→
→
→
→
→
→

Wide Column ≠ Column Store
■ Don’t confuse a wide column database with
a columnar database (aka column store)
■ Column stores store data in columnar format
● Can count “runs” of repeated values in
columns to minimize data repetition
● Mostly used for analytics processing
(OLAP)
● Examples: Druid, Pinot, Clickhouse,
BigQuery
13

Automatic Data Sharding & Replication
14
Autosharding based on Token Ranges
Using an RF=3, each data record is automatically copied
and put on two other replica nodes
Servers
ScyllaDB
■ Data automatically partitioned and
balanced across cluster based on
partition key using token ranges
■ Data within partitions is organized
by clustering key (or sort key)
■ Each record is automatically
replicated across cluster based on
replication factor (typically RF=3) to
ensure durability
■ Multi-datacenter replication built-
in
0-100
0-100
0-100
101-200
101-200
101-200
201-300
201-300
201-300

Leaderless Topology
15
Peer-to-Peer Active-Active (Multi-Datacenter)
Each node accepts reads+writes
Inherently better load balancing
Deals better w/ write-heavy or mixed read-write workloads
Clients
Servers
ScyllaDB
■ No single point of failure
■ No bottleneck at a “leader” node
■ Every node can be read-write

Coordinator Node per Operation
■ Client makes request to any
replica node
■ This “coordinator” node forwards
the request to other replicas.
■ Replicas acknowledges operation
to coordinator, which responds to
client
■ Various forms of load balancing
● Simple round-robin
● Datacenter aware round-robin
● Heat-weighted load balancing
16
16
Coordinator Node
Using token awareness, for an update, the coordinator
node will be chosen from one of the current replicas
Clients
Servers
ScyllaDB

Tunable Consistency Levels per Operation
■ “AP”-mode as per CAP theorem
● Emphasizes high availability
over strong consistency
■ Many consistency levels
● ONE
● QUORUM
● QUORUM_LOCAL
● EACH_QUORUM
● ALL
● ALL_LOCAL
17
Clients
Servers
ScyllaDB
Example: Quorum Consistency
In a cluster of 3 nodes, so long as 2 of the 3 nodes
succeed, the operation will succeed.
The third node will eventually get updated & be made
consistent, in-sync with the rest of the cluster
OK
OK NO

Write & Read Paths
■ Writes are acknowledged when
both in in-memory memtable &
durable commitlog.
■ Periodically memtables are
flushed to immutable on-disk
Sorted Strings Tables (SSTables)
■ Reads will first check the in-
memory row-based cache, or
fetch data from SSTable on disk
■ Bloom filters help the system
figure out where the data is [or
isn’t] stored
18

Discover More in ScyllaDB University
university.scylladb.com

Keep in touch!
Peter Corless
Director of Technical Advocacy
ScyllaDB
peter@scylladb.com
@PeterCorless

Survey of High Performance NoSQL Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Survey of High Performance NoSQL Systems

Semelhante a Survey of High Performance NoSQL Systems (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Survey of High Performance NoSQL Systems