Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2niSilr.
Avi Kivity discusses ScyllaDB, the many design decisions, from the programming language and programming model through low-level details and up to the advanced cache design, and more. He also discusses how to max out SSDs capable of serving hundreds of thousands of I/O operations per second, how to fully utilize machines with dozens of cores, how to bypass kernel bottlenecks and much more. Filmed at qconsf.com.
Avi Kivity, CTO of ScyllaDB, is known mostly for starting the Kernel-based Virtual Machine (KVM) project, the hypervisor underlying many production clouds. He has worked for Qumranet and Red Hat as KVM maintainer until December 2012. Avi is now CTO of ScyllaDB, a company that is bringing the same kind of innovation to the NoSQL space.
Introduction to Multilingual Retrieval Augmented Generation (RAG)
ScyllaDB: Achieving No-compromise Performance
1.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
scylladb
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
7. More Non-Agenda
● Cache lines, coherency protocols
● NUMA
● Algorithms are the only thing that matters,
everything else is implementation detail
● Docker
11. High-level Goals
● Efficiency:
○ Make the most out of every cycle
● Utilization:
○ Squeeze every cycle from the machine
● Control
○ Spend the cycles on what we want, when we want
12. Characterizing the problem
● Large numbers of small operations
○ Make coordination cheap
● Lots of communications
○ Within the machine
○ With disk
○ With other machines
16. ● Thread-per-core design
○ Never block
● Asynchronous networking
● Asynchronous file I/O
● Asynchronous multicore
General Architecture
17. Scylla has its own task scheduler
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events
22. Lower bounds for concurrency
● Disks want minimum iodepth for full
throughput (heads/chips)
● Remote nodes need concurrency to hide
network latency and their own min.
concurrency
● Compute wants work for each core
23. Results of Mathematical Analysis
● Want high concurrency (for throughput)
● Want low concurrency (for latency)
● Resources require concurrency for full
utilization
24. Sources of concurrency
● Users
○ Reduce concurrency / add nodes
● Internal processes
○ Generate as much concurrency as possible
○ Schedule
29. Using the kernel page cache
● 4k granularity
● Thread-safe
● Synchronous APIs
● General-purpose
● Lack of control (1)
● Lack of control (2)
● Exists
● Hundreds of
hacker-years
● Handling lots of edge
cases
35. Seastar memory allocator
● Non-Thread safe!
○ Each core gets a private memory pool
● Allocation back pressure
○ Allocator calls a callback when low on memory
○ Scylla evicts cache in response
37. Remaining problems with
malloc/free
● Memory gets fragmented over time
○ If workload changes sizes of allocated objects
● Allocating a large contiguous block
requires evicting most of cache
39. Log-structured memory allocation
● The cache
○ Large majority of memory allocated
○ Small subset of allocation sites
● Teach allocator how to move allocated
objects around
○ Updating references
42. Userspace TCP/IP stack
● Thread-per-core design
● Use DPDK to drive hardware
● Present as experimental mode
○ Needs more testing and productization
43. Query Compilation to Native Code
● Use LLVM to JIT-compile CQL queries
● Embed database schema and internal
object layouts into the query
44. ● Full control of the software stack can generate big
payoffs
● Careful system design can maximize throughput
● Without sacrificing latency
● Without requiring endless end-user tuning
● While having a lot of fun
Conclusions
45. ● Download: http://www.scylladb.com
● Twitter: @ScyllaDB
● Source: http://github.com/scylladb/scylla
● Mailing lists: scylladb-user @ groups.google.com
● Company site & blog: http://www.scylladb.com
How to interact