Mais conteúdo relacionado
Semelhante a Storage Developer Conference - 09/19/2012 (20)
Storage Developer Conference - 09/19/2012
- 1. Ceph: scaling storage for the
cloud and beyond
Sage Weil
Inktank
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 2. outline
● why you should care
● what is it, what it does
● distributed object storage
● ceph fs
● who we are, why we do this
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 3. why should you care about another
storage system?
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 4. requirements
● diverse storage needs
– object storage
– block devices (for VMs) with snapshots, cloning
– shared file system with POSIX, coherent caches
– structured data... files, block devices, or objects?
● scale
– terabytes, petabytes, exabytes
– heterogeneous hardware
– reliability and fault tolerance
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 5. time
● ease of administration
● no manual data migration, load balancing
● painless scaling
– expansion and contraction
– seamless migration
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 6. cost
● linear function of size or performance
● incremental expansion
– no fork-lift upgrades
● no vendor lock-in
– choice of hardware
– choice of software
● open
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 8. unified storage system
● objects
– native
– RESTful
● block
– thin provisioning, snapshots, cloning
● file
– strong consistency, snapshots
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 9. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW
RADOSGW RBD
RBD CEPH FS
CEPH FS
LIBRADOS
LIBRADOS
A bucket-based
A bucket-based A reliable and fully-
A reliable and fully- A POSIX-compliant
A POSIX-compliant
A library allowing
A library allowing REST gateway,
REST gateway, distributed block
distributed block distributed file
distributed file
apps to directly
apps to directly compatible with S3
compatible with S3 device, with aaLinux
device, with Linux system, with aa
system, with
access RADOS,
access RADOS, and Swift
and Swift kernel client and aa
kernel client and Linux kernel client
Linux kernel client
with support for
with support for QEMU/KVM driver
QEMU/KVM driver and support for
and support for
C, C++, Java,
C, C++, Java, FUSE
FUSE
Python, Ruby,
Python, Ruby,
and PHP
and PHP
RADOS
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 10. open source
● LGPLv2
– copyleft
– ok to link to proprietary code
● no copyright assignment
– no dual licensing
– no “enterprise-only” feature set
● active community
● commercial support
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 11. distributed storage system
● data center scale
– 10s to 10,000s of machines
– terabytes to exabytes
● fault tolerant
– no single point of failure
– commodity hardware
● self-managing, self-healing
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 12. ceph object model
● pools
– 1s to 100s
– independent namespaces or object collections
– replication level, placement policy
● objects
– bazillions
– blob of data (bytes to gigabytes)
– attributes (e.g., “version=12”; bytes to kilobytes)
– key/value bundle (bytes to gigabytes)
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 13. why start with objects?
● more useful than (disk) blocks
– names in a single flat namespace
– variable size
– simple API with rich semantics
● more scalable than files
– no hard-to-distribute hierarchy
– update semantics do not span objects
– workload is trivially parallel
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 14. DISK
DISK
DISK
DISK
DISK
DISK
HUMAN
HUMAN COMPUTER
COMPUTER DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 15. DISK
DISK
DISK
DISK
HUMAN
HUMAN
DISK
DISK
HUMAN
HUMAN COMPUTER
COMPUTER DISK
DISK
DISK
DISK
HUMAN
HUMAN
DISK
DISK
DISK
DISK
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 16. HUMAN
HUMAN HUMAN
HUMAN
HUMAN
HUMAN
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN HUMAN
HUMAN
DISK
DISK
DISK
DISK
HUMAN
HUMAN
DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN
DISK
DISK
(COMPUTER))
(COMPUTER
HUMAN
HUMAN
DISK
DISK
HUMAN HUMAN
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN DISK
DISK
HUMAN
HUMAN
HUMAN
HUMAN
(actually more like this…)
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 17. COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
HUMAN
HUMAN
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
HUMAN
HUMAN
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
HUMAN
HUMAN
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 18. OSD OSD OSD OSD OSD
FS FS FS FS FS btrfs
xfs
ext4
DISK DISK DISK DISK DISK
M M M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 19. Monitors:
•
Maintain cluster membership and state
M
•
Provide consensus for distributed
decision-making via Paxos
•
Small, odd number
•
These do not serve stored objects to
clients
Object Storage Daemons (OSDs):
•
At least three in a cluster
•
One per disk or RAID group
•
Serve stored objects to clients
•
Intelligently peer to perform
replication tasks
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 20. HUMAN
M
M M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 21. data distribution
● all objects are replicated N times
● objects are automatically placed, balanced,
migrated in a dynamic cluster
● must consider physical infrastructure
– ceph-osds on hosts in racks in rows in data centers
● three approaches
– pick a spot; remember where you put it
– pick a spot; write down where you put it
– calculate where to put it, where to find it
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 22. CRUSH
• Pseudo-random placement
algorithm
• Fast calculation, no lookup
• Repeatable, deterministic
• Ensures even distribution
• Stable mapping
• Limited data migration
• Rule-based configuration
• specifiable replication
• infrastructure topology aware
• allows weighting
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 23. 10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
10
10 10
10 01
01 01
01 10
10 10
10 01
01 11
11 01
01 10
10
CRUSH(pg, cluster state, policy)
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 24. 10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
10
10 10
10 01
01 01
01 10
10 10
10 01
01 11
11 01
01 10
10
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 25. RADOS
● monitors publish osd map that describes cluster state
– ceph-osd node status (up/down, weight, IP)
– CRUSH function specifying desired data distribution M
● object storage daemons (OSDs)
– safely replicate and store object
– migrate data as the cluster changes over time
– coordinate based on shared view of reality – gossip!
● decentralized, distributed approach allows
– massive scales (10,000s of servers or more)
– the illusion of a single copy with consistent behavior
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 26. CLIENT
CLIENT
??
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 29. CLIENT
??
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 30. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW
RADOSGW RBD
RBD CEPH FS
CEPH FS
LIBRADOS
A bucket-based
A bucket-based A reliable and fully-
A reliable and fully- A POSIX-compliant
A POSIX-compliant
A library allowing REST gateway,
REST gateway, distributed block
distributed block distributed file
distributed file
apps to directly compatible with S3
compatible with S3 device, with aaLinux
device, with Linux system, with aa
system, with
access RADOS, and Swift
and Swift kernel client and aa
kernel client and Linux kernel client
Linux kernel client
with support for QEMU/KVM driver
QEMU/KVM driver and support for
and support for
C, C++, Java, FUSE
FUSE
Python, Ruby,
and PHP
RADOS
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 31. APP
APP
LIBRADOS
LIBRADOS
native
M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 32. LIBRADOS
L
• Provides direct access to
RADOS for applications
• C, C++, Python, PHP, Java
• No HTTP overhead
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 33. atomic transactions
● client operations send to the OSD cluster
– operate on a single object
– can contain a sequence of operations, e.g.
● truncate object
● write new object data
● set attribute
● atomicity
– all operations commit or do not commit atomically
● conditional
– 'guard' operations can control whether operation is performed
● verify xattr has specific value
● assert object is a specific version
– allows atomic compare-and-swap etc.
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 34. key/value storage
● store key/value pairs in an object
– independent from object attrs or byte data payload
● based on google's leveldb
– efficient random and range insert/query/removal
– based on BigTable SSTable design
● exposed via key/value API
– insert, update, remove
– individual keys or ranges of keys
● avoid read/modify/write cycle for updating complex
objects
– e.g., file system directory objects
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 35. watch/notify
● establish stateful 'watch' on an object
– client interest persistently registered with object
– client keeps session to OSD open
● send 'notify' messages to all watchers
– notify message (and payload) is distributed to all watchers
– variable timeout
– notification on completion
● all watchers got and acknowledged the notify
● use any object as a communication/synchronization
channel
– locking, distributed coordination (ala ZooKeeper), etc.
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 36. OSD
CLIENT CLIENT CLIENT
#1 #2 #3
watch
ack/commit
watch
ack/commit
watch
ack/commit
notify
notify
notify
notify
ack
ack
ack
complete
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 37. watch/notify example
● radosgw cache consistency
– radosgw instances watch a single object
(.rgw/notify)
– locally cache bucket metadata
– on bucket metadata changes (removal, ACL
changes)
● write change to relevant bucket object
● send notify with bucket name to other radosgw
instances
– on receipt of notify
● invalidate relevant portion of cache
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 38. rados classes
● dynamically loaded .so
– /var/lib/rados-classes/*
– implement new object “methods” using existing methods
– part of I/O pipeline
– simple internal API
● reads
– can call existing native or class methods
– do whatever processing is appropriate
– return data
● writes
– can call existing native or class methods
– do whatever processing is appropriate
– generates a resulting transaction to be applied atomically
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 39. class examples
● grep
– read an object, filter out individual records, and
return those
● sha1
– read object, generate fingerprint, return that
● images
– rotate, resize, crop image stored in object
– remove red-eye
● crypto
– encrypt/decrypt object data with provided key
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 40. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW RBD
RBD CEPH FS
CEPH FS
LIBRADOS
LIBRADOS
A bucket-based A reliable and fully-
A reliable and fully- A POSIX-compliant
A POSIX-compliant
A library allowing
A library allowing REST gateway, distributed block
distributed block distributed file
distributed file
apps to directly
apps to directly compatible with S3 device, with aaLinux
device, with Linux system, with aa
system, with
access RADOS,
access RADOS, and Swift kernel client and aa
kernel client and Linux kernel client
Linux kernel client
with support for
with support for QEMU/KVM driver
QEMU/KVM driver and support for
and support for
C, C++, Java,
C, C++, Java, FUSE
FUSE
Python, Ruby,
Python, Ruby,
and PHP
and PHP
RADOS
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 41. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW
RADOSGW RBD CEPH FS
CEPH FS
LIBRADOS
LIBRADOS
A bucket-based
A bucket-based A reliable and fully- A POSIX-compliant
A POSIX-compliant
A library allowing
A library allowing REST gateway,
REST gateway, distributed block distributed file
distributed file
apps to directly
apps to directly compatible with S3
compatible with S3 device, with a Linux system, with aa
system, with
access RADOS,
access RADOS, and Swift
and Swift kernel client and a Linux kernel client
Linux kernel client
with support for
with support for QEMU/KVM driver and support for
and support for
C, C++, Java,
C, C++, Java, FUSE
FUSE
Python, Ruby,
Python, Ruby,
and PHP
and PHP
RADOS
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 42. COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER COMPUTER
COMPUTER DISK
DISK
DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 43. COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
VM
VM COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
VM
VM COMPUTER DISK
COMPUTER DISK
COMPUTER
COMPUTER DISK
DISK
VM
VM
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
COMPUTER
COMPUTER DISK
DISK
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 44. RADOS Block Device:
• Storage of virtual disks in RADOS
• Decouples VMs and containers
• Live migration!
• Images are striped across the cluster
• Snapshots!
• Support in
• Qemu/KVM
• OpenStack, CloudStack
• Mainline Linux kernel
• Image cloning
• Copy-on-write “snapshot” of existing
image
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 45. VM
VM
VIRTUALIZATION CONTAINER
VIRTUALIZATION CONTAINER
LIBRBD
LIBRBD
LIBRADOS
LIBRADOS
M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 46. CONTAINER
CONTAINER VM
VM CONTAINER
CONTAINER
LIBRBD
LIBRBD LIBRBD
LIBRBD
LIBRADOS
LIBRADOS LIBRADOS
LIBRADOS
M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 47. HOST
HOST
KRBD (KERNEL MODULE)
KRBD (KERNEL MODULE)
LIBRADOS
LIBRADOS
M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 48. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW
RADOSGW RBD
RBD CEPH FS
LIBRADOS
LIBRADOS
A bucket-based
A bucket-based A reliable and fully-
A reliable and fully- A POSIX-compliant
A library allowing
A library allowing REST gateway,
REST gateway, distributed block
distributed block distributed file
apps to directly
apps to directly compatible with S3
compatible with S3 device, with aaLinux
device, with Linux system, with a
access RADOS,
access RADOS, and Swift
and Swift kernel client and aa
kernel client and Linux kernel client
with support for
with support for QEMU/KVM driver
QEMU/KVM driver and support for
C, C++, Java,
C, C++, Java, FUSE
Python, Ruby,
Python, Ruby,
and PHP
and PHP
RADOS
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 49. CLIENT
CLIENT
metadata 01
01 data
10
10
M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 50. M
M
M
M M
M
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 51. Metadata Server
• Manages metadata for a
POSIX-compliant shared
filesystem
• Directory hierarchy
• File metadata (owner,
timestamps, mode, etc.)
• Stores metadata in RADOS
• Does not serve file data to
clients
• Only required for shared
filesystem
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 52. legacy metadata storage
● a scaling disaster
– name → inode →
block list → data etc
home
usr
var
– no inode table locality vmlinuz
…
– fragmentation hosts
mtab
passwd
● inode table … bin
include
lib
● directory …
● many seeks
● difficult to partition
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 53. ceph fs metadata storage
● block lists unnecessary
● inode table mostly useless
100
1
hosts
mtab
– APIs are path-based, not
etc
passwd
…
inode-based
home
usr
var
102 – no random table access,
vmlinuz bin
… include sloppy caching
lib
…
● embed inodes inside
directories
– good locality, prefetching
– leverage key/value object
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 54. one tree
three metadata servers
??
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 60. dynamic subtree partitioning
●
scalable ●
efficient
– arbitrarily partition – hierarchical partition
metadata preserve locality
●
adaptive ●
dynamic
– move work from busy – daemons can
to idle servers join/leave
– replicate hot – take over for failed
metadata nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 61. controlling metadata io
● view ceph-mds as cache
– reduce reads
● dir+inode prefetching
journal
– reduce writes
● consolidate multiple writes
● large journal or log
– stripe over objects
– two tiers
● journal for short term
● per-directory for long term directories
– fast failure recovery
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 62. what is journaled
● lots of state
– journaling is expensive up-front, cheap to recover
– non-journaled state is cheap, but complex (and somewhat
expensive) to recover
● yes
– client sessions
– actual fs metadata modifications
● no
– cache provenance
– open files
● lazy flush
– client modifications may not be durable until fsync() or visible by
another client
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 63. client protocol
● highly stateful
– consistent, fine-grained caching
● seamless hand-off between ceph-mds
daemons
– when client traverses hierarchy
– when metadata is migrated between servers
● direct access to OSDs for file I/O
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 64. an example
● mount -t ceph 1.2.3.4:/ /mnt
– 3 ceph-mon RT
– 2 ceph-mds RT (1 ceph-mds to -osd
RT) ceph-mon ceph-osd
● cd /mnt/foo/bar
– 2 ceph-mds RT (2 ceph-mds to -osd
RT)
● ls -al
– open
– readdir
ceph-mds
● 1 ceph-mds RT (1 ceph-mds to
-osd RT)
– stat each file
– close
● cp * /tmp
– N ceph-osd RT
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 65. recursive accounting
● ceph-mds tracks recursive directory stats
– file sizes
– file and directory counts
– modification time
● virtual xattrs present full stats
● efficient
$ ls alSh | head
total 0
drwxrxrx 1 root root 9.7T 20110204 15:51 .
drwxrxrx 1 root root 9.7T 20101216 15:06 ..
drwxrxrx 1 pomceph pg4194980 9.6T 20110224 08:25 pomceph
drwxrxrx 1 mcg_test1 pg2419992 23G 20110202 08:57 mcg_test1
drwxx 1 luko adm 19G 20110121 12:17 luko
drwxx 1 eest adm 14G 20110204 16:29 eest
drwxrxrx 1 mcg_test2 pg2419992 3.0G 20110202 09:34 mcg_test2
drwxx 1 fuzyceph adm 1.5G 20110118 10:46 fuzyceph
drwxrxrx 1 dallasceph pg275 596M 20110114 10:06 dallasceph
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 66. snapshots
● volume or subvolume snapshots unusable at petabyte scale
– snapshot arbitrary subdirectories
● simple interface
– hidden '.snap' directory
– no special tools
$ mkdir foo/.snap/one # create snapshot
$ ls foo/.snap
one
$ ls foo/bar/.snap
_one_1099511627776 # parent's snap name is mangled
$ rm foo/myfile
$ ls -F foo
bar/
$ ls -F foo/.snap/one
myfile bar/
$ rmdir foo/.snap/one # remove snapshot
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 67. multiple client implementations
● Linux kernel client
– mount -t ceph
1.2.3.4:/ /mnt
NFS SMB/CIFS
– export (NFS), Samba
(CIFS)
Ganesha Samba
● ceph-fuse libcephfs libcephfs
● libcephfs.so Hadoop your app
– your app libcephfs libcephfs
– Samba (CIFS) ceph-fuse
ceph fuse
– Ganesha (NFS) kernel
– Hadoop (map/reduce)
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 68. APP
APP APP
APP HOST/VM
HOST/VM CLIENT
CLIENT
RADOSGW
RADOSGW RBD
RBD CEPH FS
CEPH FS
LIBRADOS
LIBRADOS
A bucket-based
A bucket-based A reliable and fully-
A reliable and fully- A POSIX-compliant
A POSIX-compliant
A library allowing
A library allowing REST gateway,
REST gateway, distributed block
distributed block distributed file
distributed file
apps to directly
apps to directly compatible with S3
compatible with S3 device, with aaLinux
device, with Linux system, with aa
system, with
access RADOS,
access RADOS, and Swift
and Swift kernel client and aa
kernel client and Linux kernel client
Linux kernel client
with support for
with support for QEMU/KVM driver
QEMU/KVM driver and support for
and support for
C, C++, Java,
C, C++, Java, FUSE
FUSE
Python, Ruby,
Python, Ruby,
and PHP
and PHP AWESOME AWESOME
NEARLY
AWESOME AWESOME
RADOS
RADOS AWESOME
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 69. why we do this
● limited options for scalable open source
storage
● proprietary solutions
– expensive
– don't scale (well or out)
– marry hardware and software
● industry ready for change
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 70. who we are
● Ceph created at UC Santa Cruz (2004-2007)
● developed by DreamHost (2008-2011)
● supported by Inktank (2012)
– Los Angeles, Sunnyvale, San Francisco, remote
● growing user and developer community
– Linux distros, users, cloud stacks, SIs, OEMs
2012 Storage Developer Conference. © Inktank. All Rights Reserved.
- 71. thanks
sage weil
sage@inktank.com
@liewegas http://github.com/ceph
http://ceph.com/
2012 Storage Developer Conference. © Inktank. All Rights Reserved.