Scylla has emerged as the most cost-effective way to serve large amounts of data. In this keynote we will cover what makes Scylla such a great match for big data applications, what we’ve done over the past year, and our plans for the near future: performance, reliability, manageability, and above all, making Scylla easier to consume in the modern cloud environment.
7. Performance has many facets
7
▪ Direct cost of sustaining the workload
▪ Ability to sustain workload
• While scaling up
• While performing maintenance operations
▪ Handling varied query types
▪ Managing dense nodes
8. Our bread and butter
▪ Continuing to make the most out of your hardware
• Utilize all cores
• Utilize all memory for caching
• Make good use of fast solid state storage
▪ Recent improvements
• B+tree in cache
• Reactor stall elimination
• C++ coroutines
8
9. Controlling Space Amplification
9
▪ New Incremental Compaction feature:
space_amplification_goal
• When enabled, Scylla works towards minimizing duplication due to
overwrites and deletions
10. Handling more query types
▪ Point queries inside large partitions
▪ Time Window Compaction Strategy optimizations
10
14. Reshard and reshape
▪ Redistribute data during startup if needed
• Changes in core count
• Changes in compaction strategy
▪ Reshape data during repair/bootstrap
• Reduces read amplification
14
15. Streaming nodetool refresh
▪ Currently, nodetool refresh only loads local data
• Non-local data is discarded
• Increases restore/migrate time on large clusters
▪ Will stream non-local data
• Full replication from backup
• Speeds up data migration
15
17. Improving Consistency
17
▪ Raft as a common consistency protocol
▪ Raft for Schema Management
▪ Raft for Topology
▪ Raft for Tables
18. Adopting Raft across the board
▪ Raft emerges as the industry’s favorite consistency protocol
• Easy(ier) to understand
• Formal proofs
• Better performance than Paxos
▪ Single implementation, multiple use cases
• Increases reliability
18
19. Raft for Schema Management
▪ Eliminate inherent race conditions in DDL statements
▪ Example: dropping a type
19
DROP TYPE my_type;
CREATE TABLE tab (
id int PRIMARY KEY,
value my_type
);
20. Raft for Topology
▪ Eliminate edge cases where operator intervention is required
▪ Enable scaling out to multiple new nodes in parallel
▪ Enable incremental scaling
▪ Prepare the groundwork for strongly consistent tables
20
21. Raft for Tables
▪ Strongly consistent mode to complement Eventual
Consistency
▪ Higher consistency guarantees
▪ Better guarantees for indexes, materialized views, change
data capture
▪ No performance loss
▪ But: reduced availability during failover
21
22. Tablets
▪ A different way to split data among nodes
• Incremental scaling
• Requires better coordination
• Raft to the rescue
22
24. Change Data Capture
24
▪ Generally available in Scylla 4.3+
▪ Using good old CQL syntax and protocols
• Standard drivers
• Easy to integrate
▪ Also comes with Alternator (as Streams)
25. Use cases for CDC
▪ Duplicate data from Scylla to another database
▪ Analyze data in semi-real-time
▪ Feed data into a message queue (Kafka)
25
28. ▪ Bare metal / on premises
▪ Virtualized on Public Cloud
▪ Containerized, w/ container engine
▪ Kubernetes
▪ AWS Outposts
▪ Scylla Cloud
▪ Scylla Cloud - Bring Your Own Account
▪ Embedded
So many environments...
28
29. Challenges
▪ Many Operating Systems
▪ Many Hardware configs
▪ Internet connectivity
▪ Root access
▪ Centralized administration
and monitoring
▪ Third party agents
Bare metal
29
Solutions
▪ Unified .rpm/.deb/.tar
▪ iotune, installer
▪ Prepackaged dependencies
▪ Tarball install
▪ Scylla Manager and
Monitoring
▪ Helper slice
31. Challenges
▪ State-intensive system
▪ Complex administration
▪ Reduced control over host
Containers and Kubernetes
Solutions
▪ Use StatefulSets
▪ Scylla Operator
▪ Work with Kubernetes APIs
to regain control
31
32. Challenges
▪ Want, but cannot use Public
Cloud
▪ DynamoDB not available on
Outposts
AWS Outposts
Solutions
▪ Use Outposts
▪ Use ScyllaDB (with or
without Alternator)
32
33. Challenges
▪ Managing a distributed
database is hard
▪ Repair, backup, scaling
▪ Minimize latency and cost
▪ Multiple Public Clouds
Scylla Cloud
Solutions
▪ Hand that over to the
experts
▪ Automated solution
▪ VPC Peering
▪ AWS, GCP supported
33
34. Challenges
▪ Financial or regulatory
barriers to using a SaaS
database
▪ Security
▪ Complex setup
Bring Your Own Account
Solutions
▪ Run Scylla Cloud using your
own public cloud account
▪ Principle of Least Privilege,
audit
▪ Scylla Cloud Automation
34
35. Challenge
▪ Non-Cloud product needs a
database
▪ Embedded environments are
constrained and “special”
Embedded Scylla
Solution
▪ Embed Scylla in your product
▪ Scylla is frugal; work with
our team to maximize
utilization
35
37. Monitoring and observability
37
▪ Increased refinement and simplification of the dashboards
▪ Advisor section: actionable advice
▪ More detailed tracing
▪ Access metrics and perform operations from CQL
42. Download Scylla Open Source:
scylladb.com/download
Talk to an expert:
scylladb.com/consultation
Take a test drive:
scylladb.com/test-drive
The
Speaker’s
camera
displays
here
Experience Scylla for Yourself
42