Ross Turk, VP, Marketing & Community, Inktank
Ceph is an open source distributed object store, network block device, and file system designed for reliability, performance, and scalability. It runs on standard hardware, has no single point of failure, and is supported by the Linux kernel. It also works great with OpenStack and CloudStack.
If you’ve heard of Ceph but aren’t sure where it fits into your plans, this is the talk for you. Designed for those who are new to Ceph, this talk will cover Ceph’s design principles, overall architecture, and integration with other operational systems.
4. 4
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
5. 5
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
8. 8
Monitors:
• Maintain cluster membership
and state
• Provide consensus for
distributed decision-making
• Small, odd number
• These do not serve stored
objects to clients
M
OSDs:
• 10s to 10000s in a cluster
• One per disk
• (or one per SSD, RAID group…)
• Serve stored objects to
clients
• Intelligently peer to perform
replication and recovery tasks
9. 9
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
11. L
LIBRADOS
• Provides direct access to
RADOS for applications
• C, C++, Python, PHP, Java,
Erlang
• Direct access to storage nodes
• No HTTP overhead
12. 12
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
14. 14
RADOS Gateway:
• REST-based object storage
proxy
• Uses RADOS to store objects
• API supports buckets,
accounts
• Usage accounting for billing
• Compatible with S3 and
Swift applications
16. 16
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
20. 20
RADOS Block Device:
• Storage of disk images in
RADOS
• Decouples VMs from host
• Images are striped across the
cluster (pool)
• Snapshots
• Copy-on-write clones
• Support in:
• Mainline Linux Kernel (2.6.39+)
• Qemu/KVM, native Xen coming
soon
• OpenStack, CloudStack, Nebula,
Proxmox
21. 21
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
23. 23
Metadata Server
• Manages metadata for a
POSIX-compliant shared
filesystem
• Directory hierarchy
• File metadata (owner,
timestamps, mode, etc.)
• Stores metadata in RADOS
• Does not serve file data to
clients
• Only required for shared
filesystem
24. What Makes Ceph Unique?
Part one: it never, ever remembers where it puts stuff.
24
RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, the Ceph team has built three applications that allow you to store data and do fantastic things. But before we get into all of that, let’s start at the beginning of the story.
But that’s a lot to digest all at once. Let’s start with RADOS.
MDSs store all of their data within RADOS itself, but there’s still a problem…
There are multiple MDSs!
So how do you have one tree and multiple servers?
If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.
When the second one comes along, it will intelligently partition the work by taking a subtree.
When the third MDS arrives, it will attempt to split the tree again.
Same with the fourth.
A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.