ScyllaDB CTO Avi Kivity looks at the present state of Scylla's capabilities, and offers a glimpse of what's to come. From incremental compaction strategy to take advantage of newer, denser nodes, to data transformations with User Defined Functions (UDFs) and User Defined Aggregates (UDAs), ScyllaDB continues to expand its horizons for capabilities, use cases and APIs.
2. Presenter
Avi Kivity, CTO
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual
Machine (KVM) project, the hypervisor underlying many production clouds. He has
worked for Qumranet and Red Hat as KVM maintainer until December 2012. Avi is
now CTO of ScyllaDB, a company that seeks to bring the same kind of innovation to
the public cloud space.
4. Today
■ Acknowledged as the NoSQL Performance Leader
● Best choice for high throughput
● Best choice for low latency
● Only choice for combined high throughput/low latency
5. Today
■ Highest density nodes
● Shipping 60 TB/node
● No compromise on maintenance operations
● No compromise on performance
6. Today
■ Leading in Combined OLTP/OLAP
● Workload prioritization
● Cache bypass to avoid cache pollution
● Eliminate data warehouses to reduce costs AND complexity
7. Multiple ship vehicles
■ Scylla Open Source
● Early adopters, less critical workloads
■ Scylla Enterprise
● Mission critical workloads
● Enterprise specific features
■ Scylla Cloud
● Mission Critical combined with Ease of Use
9. Doubling down on our strengths
■ Increased high-density support
● Incremental Compaction Strategy
10. Doubling down on our strengths
■ Increased high-density support
● Incremental Compaction Strategy
■ More Enterprise Features
● More Encryption at Rest and Authentication options
11. Doubling down on our strengths
■ Increased high-density support
● Incremental Compaction Strategy
■ More Enterprise Features
● More Encryption at Rest and Authentication options
■ Better OLTP/OLAP integration
● Data transformations with UDF/UDA
SELECT id, words, wordcount(words)
FROM my_table
WHERE wordcount(words) > 10
ALLOW FILTERING
12. Doubling down on our strengths
■ Better OLTP/OLAP integration
● Data transformations with UDF/UDA
UDF UDA
13. Expanding our Horizons
Multi-protocol, multi-model
■ Cassandra protocol
● Native and Thrift!
■ Alternator, supporting DynamoDB™-compatible API
● Reducing cost for AWS customers
■ Redis Protocol
● From in-memory to disk-backed
14. More consistency options
■ Eventual consistency
● Single and multi-datacenter
■ Lightweight Transactions with Paxos
● For data that requires more guarantees
■ Lightweight Transactions with Raft
● For higher throughput
15. CDC: Increased integration with other systems
An insert:
> insert into base_table(pk, ck, val1, val2) values(“foo”, “bar”, “val1”, “val2”);
> insert into base_table(pk, ck, val1, val2) values(“foo”, “baz”, “vaz1”, “vaz2”);
We get an initial CDC stream:
Stream_id | time | batch_seq | operation | ttl | _pk | _ck | _val1(op, value, ttl)| _val2(...)
----------+--------+-----------+------------+-----+-------+--------+----------------------+---------------------
UUID1 | <time1>| 0 | UPDATE | | “foo” | “bar” | (ADD, “val1”, null) | (ADD, “val2”, null)
UUID1 | <time2>| 0 | UPDATE | | “foo” | “baz” | (ADD, “vaz1”, null) | (ADD, “vaz2”, null)
16. Increased integration with other systems
Change Data Capture
■ Deltas, pre-image, and post-image
■ Using the standard drivers and protocols
17. Ideally placed for a microservices world
■ “Serverless means running on someone else’s servers”
■ “Stateless means storing your state on someone else’s database”
18. Microservices and Scylla
■ High throughput allows sharing state with more and more microservices
■ Low latency keeps overall request latency under control
● More microservices = more database calls per user interaction
■ Workload prioritization keeps the database and your team focused on the
important things
● The database on the important services
● Your team on developing your application
19. Backup to Azure,
Google, Dropbox,...
Faster repair
Maintenance
windows
Clone cluster
Scylla Manager
Repair
Healthcheck
SD for Monitoring
SSH node access
1.x (current) 2.0 (in few weeks) 2.x (Q1/Q2 2020) 3.x (Q3/Q4 2020)
Backup to S3
Agent HTTPS
node access
Integrated Monitoring
Basic UI
Out/down scale
Rolling upgrade
Rolling config update
Log collection