Mais conteúdo relacionado

Mais de ScyllaDB(20)

Survey of High Performance NoSQL Systems

  1. High Performance NoSQL Masterclass Survey of High Performance NoSQL Systems Peter Corless
  2. High Performance NoSQL Masterclass Peter Corless ● Director of Technical Advocacy, ScyllaDB ● Editor / contributor to ScyllaDB blog ● Program chair for ScyllaDB Summit and P99 CONF ● Host of ScyllaDB Masterclass series ● @PeterCorless on Twitter
  3. High Performance NoSQL Masterclass NoSQL Database Landscape
  4. “Top 100” 4 As of November 2022
  5. NoSQL/Multimodel Databases in the Top 100 5 Key Value (9) Redis Memcached Hazelcast Etcd Ehcache Aerospike Riak KV RocksDB LevelDB Wide Column (8) Apache Cassandra Amazon DynamoDB ScyllaDB Apache HBase DataStax Enterprise Azure Table Storage Google Cloud Bigtable Accumulo Document (12) MongoDB Couchbase Firebase Realtime CouchDB Google Cloud Firestore Realm MarkLogic Google Cloud Datastore RavenDB IBM Cloudant RethinkDB PouchDB Graph (1) Neo4j Multimodel (5) Azure Cosmos DB ArangoDB OrientDB Oracle NoSQL Yugabyte Time Series (5) InfluxDB kdb+ Graphite Prometheus TimescaleDB [SQL]
  6. High Performance NoSQL Masterclass Document Databases 6 ● “Documents” are encoded formats ○ Javascript Object Notation (JSON) or Binary JSON (BSON) ○ Extensible Markup Language (XML) ○ (We’re not talking about managing PDFs or Word files) ● Allows “tree”-style data models ● “Parent” and “child” nodes ADVANTAGE ● Easy for developers to get started DISADVANTAGE ● Primary-replica clustering bottlenecks write-heavy workloads at scale Discover more differences: MongoDB vs. ScyllaDB Production Experience from a Dev & Ops Standpoint
  7. High Performance NoSQL Masterclass Key Value Databases 7 ● Keys are simple indexes for a record ● Values can be simple data types (.e.g, text or integer values), or more complex (lists, maps, collections) ● Often used for in-memory caching ADVANTAGE ● Fast, simple DISADVANTAGE ● Multi-datacenter clustering is an anti- pattern Why that might be a bad idea: 7 Reasons Not to Put an External Cache in Front of Your Database
  8. High Performance NoSQL Masterclass Graph Databases 8 ● Models domains as vertices (entities/objects) and edges (relationships) ● “Edges” are vital for understanding interrelationships ● Complexity grows as an n2 problem ● Query languages need to understand how to navigate topology (limit query depth, avoid infinite loops, etc.) — Cypher, Gremlin/Tinkerpop ADVANTAGE ● Models object relational complexities well DISADVANTAGE ● Data set size often limited by complexity / computational power Did you know… You can use ScyllaDB or Cassandra as Storage Backend for JanusGraph?
  9. High Performance NoSQL Masterclass Wide Column Databases 9 ● Row-based store ● “Key-key-value” ● Can be used as a simple key-value ● Many (but not all) share the SQL-like Cassandra Query Language (CQL) ● Designed for horizontal scaleout ● ScyllaDB also architected for vertical scale-up too. ADVANTAGE ● Great scaleout, global clustering DISADVANTAGE ● Intimidating to newcomers
  10. High Performance NoSQL Masterclass The Case for Wide Column NoSQL 10
  11. High Performance NoSQL Masterclass Horizontal (and Vertical) Scalability 11 ● Scale out to any number of nodes (Cassandra, ScyllaDB) ● Scale up to any number of cores per node (ScyllaDB)
  12. High Performance NoSQL Masterclass Wide Column = “Key Key Value” ■ Wide column databases are row-based ● Use partitioning & clustering (or sort) keys ● Mostly used for transaction processing (OLTP) ● Examples: Cassandra, ScyllaDB, DynamoDB 12 → → → → → →
  13. High Performance NoSQL Masterclass Wide Column ≠ Column Store ■ Don’t confuse a wide column database with a columnar database (aka column store) ■ Column stores store data in columnar format ● Can count “runs” of repeated values in columns to minimize data repetition ● Mostly used for analytics processing (OLAP) ● Examples: Druid, Pinot, Clickhouse, BigQuery 13
  14. High Performance NoSQL Masterclass Automatic Data Sharding & Replication 14 Autosharding based on Token Ranges Using an RF=3, each data record is automatically copied and put on two other replica nodes Servers ScyllaDB ■ Data automatically partitioned and balanced across cluster based on partition key using token ranges ■ Data within partitions is organized by clustering key (or sort key) ■ Each record is automatically replicated across cluster based on replication factor (typically RF=3) to ensure durability ■ Multi-datacenter replication built- in 0-100 0-100 0-100 101-200 101-200 101-200 201-300 201-300 201-300
  15. High Performance NoSQL Masterclass Leaderless Topology 15 Peer-to-Peer Active-Active (Multi-Datacenter) Each node accepts reads+writes Inherently better load balancing Deals better w/ write-heavy or mixed read-write workloads Clients Servers ScyllaDB ■ No single point of failure ■ No bottleneck at a “leader” node ■ Every node can be read-write
  16. High Performance NoSQL Masterclass Coordinator Node per Operation ■ Client makes request to any replica node ■ This “coordinator” node forwards the request to other replicas. ■ Replicas acknowledges operation to coordinator, which responds to client ■ Various forms of load balancing ● Simple round-robin ● Datacenter aware round-robin ● Heat-weighted load balancing 16 16 Coordinator Node Using token awareness, for an update, the coordinator node will be chosen from one of the current replicas Clients Servers ScyllaDB
  17. High Performance NoSQL Masterclass Tunable Consistency Levels per Operation ■ “AP”-mode as per CAP theorem ● Emphasizes high availability over strong consistency ■ Many consistency levels ● ONE ● QUORUM ● QUORUM_LOCAL ● EACH_QUORUM ● ALL ● ALL_LOCAL 17 Clients Servers ScyllaDB Example: Quorum Consistency In a cluster of 3 nodes, so long as 2 of the 3 nodes succeed, the operation will succeed. The third node will eventually get updated & be made consistent, in-sync with the rest of the cluster OK OK NO
  18. High Performance NoSQL Masterclass Write & Read Paths ■ Writes are acknowledged when both in in-memory memtable & durable commitlog. ■ Periodically memtables are flushed to immutable on-disk Sorted Strings Tables (SSTables) ■ Reads will first check the in- memory row-based cache, or fetch data from SSTable on disk ■ Bloom filters help the system figure out where the data is [or isn’t] stored 18
  19. High Performance NoSQL Masterclass Discover More in ScyllaDB University
  20. High Performance NoSQL Masterclass Keep in touch! Peter Corless Director of Technical Advocacy ScyllaDB @PeterCorless