6. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
7. I N T H E B EG I N N I N G
Magic Madzik, Flickr / CC BY 2.0
8. EA R LY I N FO R M AT I O N STO R AG E
Chico.Ferreira, Flickr / CC BY 2.0
9. W R I T I N G > C AV E PA I N T I N G S
kevingessner, Flickr / CC BY-SA 2.0
26. aa
ab 111010 ac
101 ba bb bc 111 010
da 110 db 01 010 000 dc
10
000 110 001
27. W E O U TG ROW T H E H A R D D R I V E
Mr. T in DC, Flickr / CC BY 2.0
28. DISK
DISK
HUMA
N DISK
HUMA COMPUTE DISK
N R
HUMA DISK
N
DISK
DISK
29. HUMAN HUMAN
HUMAN
HUMAN DISK
HUMAN
HUMAN DISK
HUMAN
HUMAN DISK
DISK
HUMAN
DISK
HUMAN
HUMAN DISK
(COMPUTER)
HUMAN
DISK
HUMAN HUMAN
DISK
HUMAN
HUMAN DISK
HUMAN DISK
HUMAN DISK
HUMAN
HUMAN DISK
HUMAN
HUMAN
(actually more like this…)
30. COMPUTE
DISK
R
COMPUTE
DISK
R
HUMA COMPUTE
DISK
R
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
HUMA R
DISK
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
HUMA R
DISK
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
31. aa
ab 111010 ac
101 ba bb bc 111 010
da 110 db 011 010 000 dc
000 110 001
33. COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
APP R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
34. COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
COMPUTE DISK
R
COMPUTE
R R
DISK
DISK COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
35. COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
VM COMPUTE
DISK
R
COMPUTE
DISK
R
VM COMPUTE
DISK
R
COMPUTE
DISK
VM R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
36. Ceph
Cloud computing
Distributed storage
Shared storage
Computers
Writing
Painting
S TO R A G E T H R O U G H O U T H I S TO RY
Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
37. COMPUTE
DISK
R
COMPUTE
DISK
R
HUMA COMPUTE
DISK
R
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
HUMA R
DISK
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
HUMA R
DISK
N COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
38. COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
COMPUTE
DISK
R
39. C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
40. C D
C D
HUMA C D
N C D
C D
C D
HUMA
N C D
C D
C D
HUMA
N C D
C D
C D
41. STO R AG E A P P L I A N C ES
Michael Moll, Wikipedia / CC BY-SA 2.0
42. 6 . 4 M I L L I O N S Q F T O F FAC TO R I ES
Dude94111, Flickr / CC BY 2.0
43. T EC H N O LO GY I S A CO M M O D I T Y
RaeAllen, Flickr / CC-BY 2.0
44. CO M M O D I T Y P R I C ES F LU C T UAT E
May-07 May-08 May-09 May-10 May-11 May-12
71. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
72. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
73. OSD OSD OSD OSD OSD
btrfs
FS FS FS FS FS
xfs
ext4
DISK DISK DISK DISK DISK
M M M
75. Monitors:
Maintain cluster map
M Provide consensus for
distributed decision-
making
Must have an odd number
These do not serve stored
objects to clients
OSDs:
One per disk
(recommended)
At least three in a cluster
Serve stored objects to
clients
Intelligently peer to perform
replication tasks
Supports object classes
76. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
78. LIBRADOS
Provides direct access to
L RADOS for applications
C, C++, Python, PHP, Jav
a
No HTTP overhead
79. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
80. APP APP
REST
RADOSGW RADOSGW
LIBRADOS LIBRADOS
native
M
M M
81. RADOS Gateway:
REST-based interface to
RADOS
Supports
buckets, accounting
Compatible with S3 and
Swift applications
82. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
86. RADOS Block Device:
Storage of virtual disks in
RADOS
Allows decoupling of VMs
and containers
Live migration!
Images are striped across
the cluster
Boot support in
QEMU, KVM, and
OpenStack Nova
Mount support in the Linux
kernel
87. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
89. Metadata Server
Manages metadata for a
POSIX-compliant shared
filesystem
Directory hierarchy
File metadata
(owner, timestamps, mo
de, etc.)
Stores metadata in RADOS
Does not serve file data to
clients
Only required for shared
filesystem
123. APP APP HOST/VM CLIENT
RADOSGW RBD CEPH FS
LIBRADOS
A bucket-based REST A reliable and fully- A POSIX-compliant
A library allowing gateway, compatible distributed block distributed file
apps to directly with S3 and Swift device, with a Linux system, with a Linux
access RADOS, kernel client and a kernel client and
with support for QEMU/KVM driver support for FUSE
C, C++, Java,
Python, Ruby,
and PHP AWESOME AWESOME
NEARLY
AWESOME AWESOME
RADOS AWESOME
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
125. C E P H A N D C LO U D STAC K
tableatny, Flickr / CC BY 2.0
126. R B D S U P P O RT I N C LO U D STAC K
Allows storage of virtual disks inside RADOS
Works with KVM only right now
No snapshots yet
Upcoming in CloudStack 4
More information can be found on the mailing list:
ceph-devel / incubator-cloudstack-dev:
http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505
127. Q U EST I O N S ?
Ross Turk
VP Community, Inktank
ross@inktank.com
@rossturk
inktank.com | ceph.com
Notas do Editor
People have been trying to capture knowledge for a very long time. I guess the first form of captured knowledge is the cave painting.
TODO: change this slide. Man + magnet + tape = magnetic tape.1000 books on one tape
People learned how to store data on magnetic tape.Many, many, many books could be stored on a single tape.