SlideShare uma empresa Scribd logo
1 de 136
Baixar para ler offline
S C A L I N G 	
   S T O R A G E 	
   W I T H 	
   C E P H

                 Ross	
  Turk,	
  Inktank	
  
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
I N 	
   T H E 	
   B E G I N N I N G
Magic Madzik, Flickr / CC BY 2.0
E A R L Y 	
   I N F O R M A T I O N 	
   S T O R A G E
Chico.Ferreira, Flickr / CC BY 2.0
W R I T I N G 	
   > 	
   C A V E 	
   P A I N T I N G S
kevingessner, Flickr / CC BY-SA 2.0
==
x1000        x1
P E O P L E 	
   B E G I N 	
   W R I T I N G 	
   A 	
   L O T
Moyan_Brenn, Flickr / CC BY-ND 2.0
W R I T I N G 	
   I S 	
   T I M E -­‐ C O N S U M I N G
trekkyandy, Flickr / CC BY 2.0
T H E 	
   I N D U S T R I A L I Z A T I O N 	
   O F 	
   W R I T I N G
FateDenied, Flickr / CC BY 2.0
magnet       +   tape   =    magnetic tape




                   ==
         x1000              x1
S T O R A G E 	
   B E C O M E S 	
   M E C H A N I C A L
Erik Pitti, Wikipedia / CC BY-ND 2.0
HUMAN     ROCK




          INK

HUMAN
         PAPER




HUMAN   COMPUTER   TAPE
C O M P U T E R S 	
   N E E D 	
   P E O P L E 	
   T O 	
   W O R K
USDAgov, Flickr / CC BY 2.0
HUMAN   COMPUTER   TAPE
11101011 10110110
     10110101 10101001
     00100100 01001001
     10100100 10100101
==   01011010 01101010
     10101010 10101010
     01010110 01010011
T H R O U G H P U T 	
   B E C O M E S 	
   I M P O R T A N T
Zane Luke, Flickr / CC BY-ND 2.0
L A Z 0 R 	
   B 3 A M S 	
   C H A N G E 	
   E V E R Y T H I N G ! !
Jeff Kubina, Flickr / CC-BY-SA 2.0
H A R D 	
   D R I V E S 	
   A R E 	
   T O T A L L Y 	
   B E T T E R




                      amazing spinny hard drives            sucky stupid tape
                                                             slow
E V E R Y T H I N G 	
   G E T S 	
   M E S S Y
Rob!, Flickr / CC BY 2.0
aa



      ab               111010               ac

101   ba    bb                        bc    111   010




da    110   db   011            010   000   dc




000                                         110   001
file

                                    owner: rturk
                                 created: aug12
                             last viewed: aug17
                                     size: 42025
11101011 10110110 10110101           perms: 644
10101001 00100100 01001001
10100100 10100101 01011010
01101010 10101010 10101010
aa



      ab          111010               ac

101   ba    bb                   bc    111   010




da    110   db   01        010   000   dc
                 10

000                                    110   001
W E 	
   O U T G R O W 	
   T H E 	
   H A R D 	
   D R I V E
Mr. T in DC, Flickr / CC BY 2.0
DISK

                   DISK

                   DISK

HUMAN   COMPUTER   DISK

                   DISK

                   DISK

                   DISK
P E O P L E 	
   N E E D 	
   S I M U L T A N E O U S 	
   A C C E S S
wFourier, Flickr / CC BY 2.0
DISK

                   DISK
HUMAN
                   DISK

HUMAN   COMPUTER   DISK

                   DISK
HUMAN
                   DISK

                   DISK
HUMAN          HUMAN


                           HUMAN
 HUMAN                                                          DISK
                 HUMAN
HUMAN                                                           DISK
 HUMAN
                  HUMAN                                         DISK
                                                                DISK
        HUMAN
                                                                DISK
            HUMAN
HUMAN                                                           DISK
                                       (COMPUTER)
                HUMAN
                                                                DISK
                   HUMAN
  HUMAN
                                                                DISK
                HUMAN
 HUMAN                                                          DISK
                 HUMAN                                          DISK
  HUMAN                                                         DISK
                        HUMAN
        HUMAN                                                   DISK
                        HUMAN

          HUMAN
                                   (actually more like this…)
COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
X
                         aa



      ab               111010               ac

101   ba    bb                        bc    111   010




da    110   db   011            010   000   dc




000                                         110   001
object

                                    pace: quick
                                    driver: frog
                               license: expired
                              expression: agog
11101011 10110110 10110101
10101001 00100100 01001001
10100100 10100101 01011010
01101010 10101010 10101010
COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
APP
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
      COMPUTER   DISK
COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
COMPUTER
                  COMPUTER   DISK
           DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
                  COMPUTER   DISK
COMPUTER   DISK
     COMPUTER   DISK
     COMPUTER   DISK
     COMPUTER   DISK
VM   COMPUTER   DISK
     COMPUTER   DISK
VM   COMPUTER   DISK
     COMPUTER   DISK
VM
     COMPUTER   DISK
     COMPUTER   DISK
     COMPUTER   DISK
     COMPUTER   DISK
Ceph




                                                                                                          Cloud computing


                                                                                              Distributed storage




                                                                             Shared storage




                                 Computers
               Writing

        Painting



S T O R A G E 	
   T H R O U G H O U T 	
   H I S T O R Y
Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
HUMAN
        COMPUTER   DISK
        COMPUTER   DISK
        COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
COMPUTER   DISK
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
C D
        C D
        C D
HUMAN
        C D
        C D
        C D
HUMAN   C D
        C D
        C D
HUMAN   C D
        C D
        C D
S T O R A G E 	
   A P P L I A N C E S
Michael Moll, Wikipedia / CC BY-SA 2.0
6 . 4 	
   M I L L I O N 	
   S Q F T 	
   O F 	
   F A C T O R I E S
Dude94111, Flickr / CC BY 2.0
S T O R A G E 	
   V E N D O R S 	
   H A V E 	
   B I G 	
   B I L L S
CarbonNYC, Flickr / CC BY 2.0
S T O R A G E 	
   A P P L I A N C E S 	
   A R E 	
   E X P E N S I V E
401K 2012, Flickr / CC BY-SA 2.0
T E C H N O L O G Y 	
   I S 	
   A 	
   C O M M O D I T Y
RaeAllen, Flickr / CC-BY 2.0
C O M M O D I T Y 	
   P R I C E S 	
   F L U C T U A T E




May-07           May-08          May-09          May-10         May-11   May-12
G R O W I N G 	
   W I T H 	
   H A R D W A R E 	
   A P P L I A N C E S


     C   D    §  First PB                C   D      §  Second PB
     C   D         §  Proprietary        C   D          §  Proprietary
     C   D             storage            C   D              storage
     C   D             hardware           C   D              hardware
     C   D         §  Well-known         C   D          §  Same storage
     C   D             storage            C   D              vendor
     C   D             vendor             C   D

     C   D                                C   D

     C   D                                C   D
                                                     §  Another $14
     C   D
              §  $14 b’zillion           C   D          b’zillion
     C   D                                C   D

     C   D                                C   D
A P P L I A N C E S 	
   A R E 	
   O L D 	
   T E C H N O L O G Y
Paul Keller, Flickr / CC BY 2.0
Source: http://www.cpubenchmark.net/high_end_cpus.html
FLAGSHIP
HARDWARE
APPLIANCE
Hardware Appliances are Mysterious Black Boxes
Abode of Chaos, Flickr / CC BY 2.0
C   D

      C   D

 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D
X
      C   D

      C   D

 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D
C   D

                   C   D

                   C   D

                   C   D

                   C   D

HUMAN         !!   C   D

[DEVELOPER]        C   D

                   C   D

                   C   D

                   C   D

                   C   D

                   C   D
THE WORLD
        NEEDS
A STORAGE TECHNOLOGY
        THAT
   SCALES INFINITELY
THE WORLD
         NEEDS
A STORAGE TECHNOLOGY
 THAT DOESN’T REQUIRE
          AN
      INDUSTRIAL
    MANUFACTURING
        PROCESS
S A G E 	
   W E I L



§  Co-founder of DreamHost

§  Inventor of Ceph

§  CEO of Inktank
philosophy   design


OPEN SOURCE
O P E N 	
   S O U R C E 	
   S P R E A D S 	
   I D E A S
orchidgalore, Flickr / CC BY 2.0
philosophy   design


      OPEN SOURCE

COMMUNITY-FOCUSED
W E 	
   A R E 	
   S M A R T E R 	
   T O G E T H E R
rturk, Linkedin Inmap
C E P H 	
   B E L O N G S 	
   T O 	
   A L L 	
   O F 	
   U S
wackybadger, Flickr / CC BY 2.0
philosophy   design


      OPEN SOURCE     SCALABLE

COMMUNITY-FOCUSED
Ceph




                                                                             Too much for a room


                                                                  Too much for a computer




                                                Too much for a drive




                 Too much for a book



    Too much for a cave



C E P H 	
   I S 	
   B U I L T 	
   T O 	
   S C A L E
philosophy   design


      OPEN SOURCE     SCALABLE

COMMUNITY-FOCUSED     NO SINGLE POINT OF FAILURE
A R I L O M A X 	
   C A L I F O R N I C U S
aroid, Flickr / CC BY 2.0
single point
                                                of failure




                                             highly-available
replicated




T H E 	
   O C T O P U S 	
   ( A 	
   M E T A P H O R )
I love speaking in metaphors.
T H E 	
   B E E H I V E 	
   ( A N O T H E R 	
   M E T A P H O R )
blumenbiene, Flickr / CC BY 2.0
philosophy   design


      OPEN SOURCE     SCALABLE

COMMUNITY-FOCUSED     NO SINGLE POINT OF FAILURE

                      SOFTWARE BASED
C   D

      C   D

 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D
C   D

      C   D




✔
 C    C   D

      C   D

          D

      C   D

      C   D
C++   C   D

      C   D

      C   D

      C   D

      C   D
philosophy   design


      OPEN SOURCE     SCALABLE

COMMUNITY-FOCUSED     NO SINGLE POINT OF FAILURE

                      SOFTWARE BASED

                      SELF-MANAGING
D I S K S 	
   = 	
   J U S T 	
   T I N Y 	
   R E C O R D 	
   P L A Y E R S
jon_a_ross, Flickr / CC BY 2.0
D    D

  D    D


  D    D      =
  D    D


x 1 MILLION
                  55 times / day
I T 	
   A L L 	
   S T A R T E D 	
   W I T H 	
   A 	
   D R E A M
+
N E W 	
   M O N T H L Y 	
   C O D E 	
   C O M M I T S

700




600




500




400




300




200




100




  0
  2004-06      2005-07   2006-07   2007-07   2008-07   2009-07   2010-07   2011-07
C E P H 	
   S T A R T S 	
   P O P P I N G 	
   U P !




                              (sorry about all the logo tampering)
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
OSD    OSD    OSD    OSD    OSD




                                   btrfs
FS      FS    FS     FS     FS
                                   xfs
                                   ext4
DISK   DISK   DISK   DISK   DISK




  M            M            M
HUMAN




        M




M           M
M
    Monitors:
    §  Maintain cluster map
    §  Provide consensus for
        distributed decision-
        making
    §  Must have an odd number
    §  These do not serve stored
        objects to clients


    OSDs:
    §  One per disk
        (recommended)
    §  At least three in a cluster
    §  Serve stored objects to
        clients
    §  Intelligently peer to perform
        replication tasks
    §  Supports object classes
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
APP
    LIBRADOS

               native




    M
M               M
L
    LIBRADOS
    §  Provides direct access to
        RADOS for applications
    §  C, C++, Python, PHP,
        Java
    §  No HTTP overhead
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
APP                APP
                                REST




RADOSGW          RADOSGW
  LIBRADOS           LIBRADOS


                                       native




             M
       M         M
RADOS Gateway:
§  REST-based interface to
    RADOS
§  Supports buckets,
    accounting
§  Compatible with S3 and
    Swift applications
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M
CONTAINER            VM       CONTAINER
   LIBRBD                        LIBRBD
  LIBRADOS                      LIBRADOS




                 M
             M            M
HOST
    KRBD (KERNEL MODULE)
           LIBRADOS




       M
M                          M
RADOS Block Device:
§  Storage of virtual disks in
    RADOS
§  Allows decoupling of VMs
    and containers
     §  Live migration!
§  Images are striped across
    the cluster
§  Boot support in QEMU,
    KVM, and OpenStack Nova
§  Mount support in the Linux
    kernel
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP




RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
CLIENT



metadata           01   data
                   10




               M
           M            M
Metadata Server
§  Manages metadata for a
    POSIX-compliant shared
    filesystem
     §  Directory hierarchy
     §  File metadata (owner,
         timestamps, mode, etc.)
§  Stores metadata in RADOS
§  Does not serve file data to
    clients
§  Only required for shared
    filesystem
WHAT MAKES CEPH
   UNIQUE?
H O W 	
   D O 	
   Y O U 	
   F I N D 	
   Y O U R 	
   K E Y S ?
azmeen, Flickr / CC BY 2.0
C D
           C D
           C D
           C D
           C D
      ??
APP        C D
           C D
           C D
           C D
           C D
           C D
           C D
C D
           C D   A-G
           C D
           C D
           C D   H-N
APP   F*   C D
           C D
           C D   O-T
           C D
           C D
           C D   U-Z
           C D
I 	
   A L W A Y S 	
   P U T 	
   M Y 	
   K E Y S 	
   O N 	
   T H E 	
   H O O K
vitamindave, Flickr / CC BY 2.0
C D
      C D
      C D
      C D
      C D
APP   C D
      C D
      C D
      C D
      C D
      C D
      C D
D E A R 	
   D I A R Y : 	
   K E Y S 	
   = 	
   I N 	
   T H E 	
   K I T C H E N
Barnaby, Flickr / CC BY 2.0
HOW DO YOU
  FIND YOUR KEYS
WHEN YOUR HOUSE
         IS
   INFINITELY BIG
        AND
ALWAYS CHANGING?
T H E 	
   A N S W E R : 	
   C R U S H ! !
pasukaru76, Flickr / CC SA 2.0
10 10 01 01 10 10 01 11 01 10

                               hash(object name) % num pg

10   10    01   01   10   10    01   11   01   10




                               CRUSH(pg, cluster state, rule set)
10 10 01 01 10 10 01 11 01 10




10   10    01   01   10   10   01   11    01   10
CRUSH
§  Pseudo-random placement
    algorithm
§  Ensures even distribution
§  Repeatable, deterministic
§  Rule-based configuration
     §  Replica count
     §  Infrastructure topology
     §  Weighting
CLIENT

         ??
CLIENT

         ??
VM




VIRTUALIZATION CONTAINER
             LIBRBD
            LIBRADOS




        M
   M                   M
HOW DO YOU
      SPIN UP
THOUSANDS OF VMs
    INSTANTLY
       AND
  EFFICIENTLY?
instant copy




144   0       0      0   0   = 144
write
                          CLIENT
                  write


                  write


                  write




144   4   = 148
read


                  read
                         CLIENT
                  read




144   4   = 148
HOW DO YOU
        MANAGE
 DIRECTORY HEIRARCHY
        WITHOUT
           A
SINGLE POINT OF FAILURE?
F I L E S Y S T E M S 	
   R E Q U I R E 	
   M E T A D A T A
Barnaby, Flickr / CC BY 2.0
CLIENT



        01
        10




    M
M            M
M
M       M
one tree




three metadata servers


                              ??
DYNAMIC SUBTREE PARTITIONING
AND NOW
BACKPEDALING
ALMOST
EVERYTHING
  WORKS
APP                    APP                  HOST/VM                   CLIENT



                       RADOSGW                 RBD                      CEPH FS
  LIBRADOS
                       A bucket-based REST     A reliable and fully-    A POSIX-compliant
  A library allowing   gateway, compatible     distributed block        distributed file
  apps to directly     with S3 and Swift       device, with a Linux     system, with a Linux
  access RADOS,                                kernel client and a      kernel client and
  with support for                             QEMU/KVM driver          support for FUSE
  C, C++, Java,
  Python, Ruby,
  and PHP                 AWESOME                  AWESOME
                                                                             NEARLY
   AWESOME                                                                  AWESOME


RADOS                                    AWESOME
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
*
LAN SCALE!!
* OR REALLY REALLY SCARY FAST WAN
C E P H 	
   A N D 	
   C L O U D S T A C K
tableatny, Flickr / CC BY 2.0
R B D 	
   S U P P O R T 	
   I N 	
   C L O U D S T A C K

§  Just announced two weeks ago!
§  Allows storage of virtual disks inside RADOS
    §  Works with KVM only right now
    §  No volume snapshots yet
§  Requires the latest version of, um, everything
§  More information can be found on the mailing list:
    §  ceph-devel / incubator-cloudstack-dev:
       http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505
QUESTIONS?


Ross Turk
VP Community, Inktank

§  ross@inktank.com
§  @rossturk

inktank.com | ceph.com

Mais conteúdo relacionado

Mais de buildacloud

Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh BoddapatiPolicy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
buildacloud
 
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex HenevaldCloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
buildacloud
 
Managing infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike CohenManaging infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike Cohen
buildacloud
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
buildacloud
 
State of the cloud by reuven cohen
State of the cloud by reuven cohenState of the cloud by reuven cohen
State of the cloud by reuven cohen
buildacloud
 
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
buildacloud
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
buildacloud
 
CloudStack University by Sebastien Goasguen
CloudStack University by Sebastien GoasguenCloudStack University by Sebastien Goasguen
CloudStack University by Sebastien Goasguen
buildacloud
 

Mais de buildacloud (20)

The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh BoddapatiPolicy Based SDN Solution for DC and Branch Office by Suresh Boddapati
Policy Based SDN Solution for DC and Branch Office by Suresh Boddapati
 
L4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef LaribiL4-L7 services for SDN and NVF by Youcef Laribi
L4-L7 services for SDN and NVF by Youcef Laribi
 
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David NalleyJenkins, jclouds, CloudStack, and CentOS by David Nalley
Jenkins, jclouds, CloudStack, and CentOS by David Nalley
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
 
Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex HenevaldCloud Application Blueprints with Apache Brooklyn by Alex Henevald
Cloud Application Blueprints with Apache Brooklyn by Alex Henevald
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Managing infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike CohenManaging infrastructure with Application Policy by Mike Cohen
Managing infrastructure with Application Policy by Mike Cohen
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
 
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike TurnlundMonitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
Monitoring CloudStack in context with Converged Infrastructure by Mike Turnlund
 
Rest api design by george reese
Rest api design by george reeseRest api design by george reese
Rest api design by george reese
 
Enterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevensEnterprise grade firewall and ssl termination to ac by will stevens
Enterprise grade firewall and ssl termination to ac by will stevens
 
State of the cloud by reuven cohen
State of the cloud by reuven cohenState of the cloud by reuven cohen
State of the cloud by reuven cohen
 
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell PavlicekSecuring Your Cloud With the Xen Hypervisor by Russell Pavlicek
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicek
 
DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack DevCloud - Setup and Demo on Apache CloudStack
DevCloud - Setup and Demo on Apache CloudStack
 
Cloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper ContrailCloud Network Virtualization with Juniper Contrail
Cloud Network Virtualization with Juniper Contrail
 
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
Ian rae panel cloud stack & cloud storage where are we at, and where do we ne...
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
 
CloudStack University by Sebastien Goasguen
CloudStack University by Sebastien GoasguenCloudStack University by Sebastien Goasguen
CloudStack University by Sebastien Goasguen
 

Último

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

BACD LA 2013 - Scaling Storage with Ceph

  • 1. S C A L I N G   S T O R A G E   W I T H   C E P H Ross  Turk,  Inktank  
  • 2.
  • 3.
  • 4. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 5. I N   T H E   B E G I N N I N G Magic Madzik, Flickr / CC BY 2.0
  • 6. E A R L Y   I N F O R M A T I O N   S T O R A G E Chico.Ferreira, Flickr / CC BY 2.0
  • 7. W R I T I N G   >   C A V E   P A I N T I N G S kevingessner, Flickr / CC BY-SA 2.0
  • 8. == x1000 x1
  • 9. P E O P L E   B E G I N   W R I T I N G   A   L O T Moyan_Brenn, Flickr / CC BY-ND 2.0
  • 10. W R I T I N G   I S   T I M E -­‐ C O N S U M I N G trekkyandy, Flickr / CC BY 2.0
  • 11. T H E   I N D U S T R I A L I Z A T I O N   O F   W R I T I N G FateDenied, Flickr / CC BY 2.0
  • 12. magnet + tape = magnetic tape == x1000 x1
  • 13. S T O R A G E   B E C O M E S   M E C H A N I C A L Erik Pitti, Wikipedia / CC BY-ND 2.0
  • 14. HUMAN ROCK INK HUMAN PAPER HUMAN COMPUTER TAPE
  • 15. C O M P U T E R S   N E E D   P E O P L E   T O   W O R K USDAgov, Flickr / CC BY 2.0
  • 16. HUMAN COMPUTER TAPE
  • 17. 11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 == 01011010 01101010 10101010 10101010 01010110 01010011
  • 18. T H R O U G H P U T   B E C O M E S   I M P O R T A N T Zane Luke, Flickr / CC BY-ND 2.0
  • 19. L A Z 0 R   B 3 A M S   C H A N G E   E V E R Y T H I N G ! ! Jeff Kubina, Flickr / CC-BY-SA 2.0
  • 20. H A R D   D R I V E S   A R E   T O T A L L Y   B E T T E R amazing spinny hard drives sucky stupid tape slow
  • 21. E V E R Y T H I N G   G E T S   M E S S Y Rob!, Flickr / CC BY 2.0
  • 22. aa ab 111010 ac 101 ba bb bc 111 010 da 110 db 011 010 000 dc 000 110 001
  • 23. file owner: rturk created: aug12 last viewed: aug17 size: 42025 11101011 10110110 10110101 perms: 644 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
  • 24. aa ab 111010 ac 101 ba bb bc 111 010 da 110 db 01 010 000 dc 10 000 110 001
  • 25. W E   O U T G R O W   T H E   H A R D   D R I V E Mr. T in DC, Flickr / CC BY 2.0
  • 26. DISK DISK DISK HUMAN COMPUTER DISK DISK DISK DISK
  • 27. P E O P L E   N E E D   S I M U L T A N E O U S   A C C E S S wFourier, Flickr / CC BY 2.0
  • 28. DISK DISK HUMAN DISK HUMAN COMPUTER DISK DISK HUMAN DISK DISK
  • 29. HUMAN HUMAN HUMAN HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN DISK DISK HUMAN DISK HUMAN HUMAN DISK (COMPUTER) HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN DISK HUMAN DISK HUMAN DISK HUMAN HUMAN DISK HUMAN HUMAN (actually more like this…)
  • 30. COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 31. X aa ab 111010 ac 101 ba bb bc 111 010 da 110 db 011 010 000 dc 000 110 001
  • 32. object pace: quick driver: frog license: expired expression: agog 11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010
  • 33. COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK APP COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 34. COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER COMPUTER DISK DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 35. COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK VM COMPUTER DISK COMPUTER DISK VM COMPUTER DISK COMPUTER DISK VM COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 36. Ceph Cloud computing Distributed storage Shared storage Computers Writing Painting S T O R A G E   T H R O U G H O U T   H I S T O R Y Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.
  • 37. COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK HUMAN COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 38. COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK COMPUTER DISK
  • 39. C D C D C D C D C D C D C D C D C D C D C D C D
  • 40. C D C D C D HUMAN C D C D C D HUMAN C D C D C D HUMAN C D C D C D
  • 41. S T O R A G E   A P P L I A N C E S Michael Moll, Wikipedia / CC BY-SA 2.0
  • 42. 6 . 4   M I L L I O N   S Q F T   O F   F A C T O R I E S Dude94111, Flickr / CC BY 2.0
  • 43. S T O R A G E   V E N D O R S   H A V E   B I G   B I L L S CarbonNYC, Flickr / CC BY 2.0
  • 44. S T O R A G E   A P P L I A N C E S   A R E   E X P E N S I V E 401K 2012, Flickr / CC BY-SA 2.0
  • 45. T E C H N O L O G Y   I S   A   C O M M O D I T Y RaeAllen, Flickr / CC-BY 2.0
  • 46. C O M M O D I T Y   P R I C E S   F L U C T U A T E May-07 May-08 May-09 May-10 May-11 May-12
  • 47. G R O W I N G   W I T H   H A R D W A R E   A P P L I A N C E S C D §  First PB C D §  Second PB C D §  Proprietary C D §  Proprietary C D storage C D storage C D hardware C D hardware C D §  Well-known C D §  Same storage C D storage C D vendor C D vendor C D C D C D C D C D §  Another $14 C D §  $14 b’zillion C D b’zillion C D C D C D C D
  • 48. A P P L I A N C E S   A R E   O L D   T E C H N O L O G Y Paul Keller, Flickr / CC BY 2.0
  • 51. Hardware Appliances are Mysterious Black Boxes Abode of Chaos, Flickr / CC BY 2.0
  • 52. C D C D C C D C D D C D C D C++ C D C D C D C D C D
  • 53. X C D C D C C D C D D C D C D C++ C D C D C D C D C D
  • 54. C D C D C D C D C D HUMAN !! C D [DEVELOPER] C D C D C D C D C D C D
  • 55. THE WORLD NEEDS A STORAGE TECHNOLOGY THAT SCALES INFINITELY
  • 56. THE WORLD NEEDS A STORAGE TECHNOLOGY THAT DOESN’T REQUIRE AN INDUSTRIAL MANUFACTURING PROCESS
  • 57. S A G E   W E I L §  Co-founder of DreamHost §  Inventor of Ceph §  CEO of Inktank
  • 58. philosophy design OPEN SOURCE
  • 59. O P E N   S O U R C E   S P R E A D S   I D E A S orchidgalore, Flickr / CC BY 2.0
  • 60. philosophy design OPEN SOURCE COMMUNITY-FOCUSED
  • 61. W E   A R E   S M A R T E R   T O G E T H E R rturk, Linkedin Inmap
  • 62. C E P H   B E L O N G S   T O   A L L   O F   U S wackybadger, Flickr / CC BY 2.0
  • 63. philosophy design OPEN SOURCE SCALABLE COMMUNITY-FOCUSED
  • 64. Ceph Too much for a room Too much for a computer Too much for a drive Too much for a book Too much for a cave C E P H   I S   B U I L T   T O   S C A L E
  • 65. philosophy design OPEN SOURCE SCALABLE COMMUNITY-FOCUSED NO SINGLE POINT OF FAILURE
  • 66. A R I L O M A X   C A L I F O R N I C U S aroid, Flickr / CC BY 2.0
  • 67. single point of failure highly-available replicated T H E   O C T O P U S   ( A   M E T A P H O R ) I love speaking in metaphors.
  • 68. T H E   B E E H I V E   ( A N O T H E R   M E T A P H O R ) blumenbiene, Flickr / CC BY 2.0
  • 69. philosophy design OPEN SOURCE SCALABLE COMMUNITY-FOCUSED NO SINGLE POINT OF FAILURE SOFTWARE BASED
  • 70. C D C D C C D C D D C D C D C++ C D C D C D C D C D
  • 71. C D C D ✔ C C D C D D C D C D C++ C D C D C D C D C D
  • 72. philosophy design OPEN SOURCE SCALABLE COMMUNITY-FOCUSED NO SINGLE POINT OF FAILURE SOFTWARE BASED SELF-MANAGING
  • 73. D I S K S   =   J U S T   T I N Y   R E C O R D   P L A Y E R S jon_a_ross, Flickr / CC BY 2.0
  • 74. D D D D D D = D D x 1 MILLION 55 times / day
  • 75.
  • 76. I T   A L L   S T A R T E D   W I T H   A   D R E A M
  • 77. +
  • 78. N E W   M O N T H L Y   C O D E   C O M M I T S 700 600 500 400 300 200 100 0 2004-06 2005-07 2006-07 2007-07 2008-07 2009-07 2010-07 2011-07
  • 79. C E P H   S T A R T S   P O P P I N G   U P ! (sorry about all the logo tampering)
  • 80. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 81. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 82. OSD OSD OSD OSD OSD btrfs FS FS FS FS FS xfs ext4 DISK DISK DISK DISK DISK M M M
  • 83. HUMAN M M M
  • 84. M Monitors: §  Maintain cluster map §  Provide consensus for distributed decision- making §  Must have an odd number §  These do not serve stored objects to clients OSDs: §  One per disk (recommended) §  At least three in a cluster §  Serve stored objects to clients §  Intelligently peer to perform replication tasks §  Supports object classes
  • 85. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 86. APP LIBRADOS native M M M
  • 87. L LIBRADOS §  Provides direct access to RADOS for applications §  C, C++, Python, PHP, Java §  No HTTP overhead
  • 88. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 89. APP APP REST RADOSGW RADOSGW LIBRADOS LIBRADOS native M M M
  • 90. RADOS Gateway: §  REST-based interface to RADOS §  Supports buckets, accounting §  Compatible with S3 and Swift applications
  • 91. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 92. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M
  • 93. CONTAINER VM CONTAINER LIBRBD LIBRBD LIBRADOS LIBRADOS M M M
  • 94. HOST KRBD (KERNEL MODULE) LIBRADOS M M M
  • 95. RADOS Block Device: §  Storage of virtual disks in RADOS §  Allows decoupling of VMs and containers §  Live migration! §  Images are striped across the cluster §  Boot support in QEMU, KVM, and OpenStack Nova §  Mount support in the Linux kernel
  • 96. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 97. CLIENT metadata 01 data 10 M M M
  • 98. Metadata Server §  Manages metadata for a POSIX-compliant shared filesystem §  Directory hierarchy §  File metadata (owner, timestamps, mode, etc.) §  Stores metadata in RADOS §  Does not serve file data to clients §  Only required for shared filesystem
  • 99. WHAT MAKES CEPH UNIQUE?
  • 100. H O W   D O   Y O U   F I N D   Y O U R   K E Y S ? azmeen, Flickr / CC BY 2.0
  • 101. C D C D C D C D C D ?? APP C D C D C D C D C D C D C D
  • 102. C D C D A-G C D C D C D H-N APP F* C D C D C D O-T C D C D C D U-Z C D
  • 103. I   A L W A Y S   P U T   M Y   K E Y S   O N   T H E   H O O K vitamindave, Flickr / CC BY 2.0
  • 104. C D C D C D C D C D APP C D C D C D C D C D C D C D
  • 105. D E A R   D I A R Y :   K E Y S   =   I N   T H E   K I T C H E N Barnaby, Flickr / CC BY 2.0
  • 106. HOW DO YOU FIND YOUR KEYS WHEN YOUR HOUSE IS INFINITELY BIG AND ALWAYS CHANGING?
  • 107. T H E   A N S W E R :   C R U S H ! ! pasukaru76, Flickr / CC SA 2.0
  • 108. 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg 10 10 01 01 10 10 01 11 01 10 CRUSH(pg, cluster state, rule set)
  • 109. 10 10 01 01 10 10 01 11 01 10 10 10 01 01 10 10 01 11 01 10
  • 110. CRUSH §  Pseudo-random placement algorithm §  Ensures even distribution §  Repeatable, deterministic §  Rule-based configuration §  Replica count §  Infrastructure topology §  Weighting
  • 111. CLIENT ??
  • 112.
  • 113.
  • 114. CLIENT ??
  • 115. VM VIRTUALIZATION CONTAINER LIBRBD LIBRADOS M M M
  • 116. HOW DO YOU SPIN UP THOUSANDS OF VMs INSTANTLY AND EFFICIENTLY?
  • 117. instant copy 144 0 0 0 0 = 144
  • 118. write CLIENT write write write 144 4 = 148
  • 119. read read CLIENT read 144 4 = 148
  • 120. HOW DO YOU MANAGE DIRECTORY HEIRARCHY WITHOUT A SINGLE POINT OF FAILURE?
  • 121. F I L E S Y S T E M S   R E Q U I R E   M E T A D A T A Barnaby, Flickr / CC BY 2.0
  • 122. CLIENT 01 10 M M M
  • 123. M M M
  • 124. one tree three metadata servers ??
  • 125.
  • 126.
  • 127.
  • 128.
  • 132. APP APP HOST/VM CLIENT RADOSGW RBD CEPH FS LIBRADOS A bucket-based REST A reliable and fully- A POSIX-compliant A library allowing gateway, compatible distributed block distributed file apps to directly with S3 and Swift device, with a Linux system, with a Linux access RADOS, kernel client and a kernel client and with support for QEMU/KVM driver support for FUSE C, C++, Java, Python, Ruby, and PHP AWESOME AWESOME NEARLY AWESOME AWESOME RADOS AWESOME A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 133. * LAN SCALE!! * OR REALLY REALLY SCARY FAST WAN
  • 134. C E P H   A N D   C L O U D S T A C K tableatny, Flickr / CC BY 2.0
  • 135. R B D   S U P P O R T   I N   C L O U D S T A C K §  Just announced two weeks ago! §  Allows storage of virtual disks inside RADOS §  Works with KVM only right now §  No volume snapshots yet §  Requires the latest version of, um, everything §  More information can be found on the mailing list: §  ceph-devel / incubator-cloudstack-dev: http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505
  • 136. QUESTIONS? Ross Turk VP Community, Inktank §  ross@inktank.com §  @rossturk inktank.com | ceph.com