SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Introduc)on	
  to	
  NoSQL	
  	
  
         and	
  
      Couchbase	
  
                      Dip&	
  Borkar	
  
             Director,	
  Product	
  Management	
  




                                                      1	
  
WHY	
  TRANSITION	
  TO	
  NOSQL?	
  
               	
  




                                        2	
  
Two	
  big	
  drivers	
  for	
  NoSQL	
  adop&on	
  


             49%	
  
                                                     35%	
  
                                                                                29%	
  

                                                                                                16%	
          12%	
                 11%	
  

Lack	
  of	
  flexibility/	
                    Inability	
  to	
              Performance	
      Cost	
     All	
  of	
  these	
     Other	
  
  rigid	
  schemas	
                         scale	
  out	
  data	
            challenges	
  



Source:	
  Couchbase	
  Survey,	
  December	
  2011,	
  n	
  =	
  1351.	
  


                                                                                                                                                 3	
  
NoSQL	
  catalog	
  


                       Key-­‐Value	
     Data	
  Structure	
     Document	
       Column	
        Graph	
  
(memory	
  only)	
  
   Cache	
  




                       memcached	
              redis	
  
(memory/disk)	
  




                        membase	
                                 couchbase	
     cassandra	
      Neo4j	
  
  Database	
  




                                                                  mongoDB	
  
                                                                                                               4	
  
DISTRIBUTED	
  DOCUMENT	
  
      DATABASES	
  




                              5	
  
Document	
  Databases	
  


•  Each	
  record	
  in	
  the	
  database	
  is	
  a	
  self-­‐
   describing	
  document	
  	
                                    {	
  

•  Each	
  document	
  has	
  an	
  independent	
  
                                                                   “UUID”:	
  “ 21f7f8de-­‐8051-­‐5b89-­‐86
                                                                   “Time”:	
   “2011-­‐04-­‐01T13:01:02.42
                                                                   “Server”:	
   “A2223E”,

   structure	
                                                     “Calling	
   Server”:	
   “A2213W”,
                                                                   “Type”:	
   “E100”,
                                                                   “Initiating	
   User”:	
   “dsallings@spy.net”,

•  Documents	
  can	
  be	
  complex	
  	
                         “Details”:	
  
                                                                            {
                                                                            “IP”:	
  “ 10.1.1.22”,
•  All	
  databases	
  require	
  a	
  unique	
  key	
                      “API”:	
   “InsertDVDQueueItem”,
                                                                            “Trace”:	
   “cleansed”,

•  Documents	
  are	
  stored	
  using	
  JSON	
  or	
  
                                                                            “Tags”:	
  
                                                                                     [
                                                                                     “SERVER”,	
  

   XML	
  or	
  their	
  deriva&ves	
                                                “US-­‐West”,	
  
                                                                                     “API”
                                                                                       ]

•  Content	
  can	
  be	
  indexed	
  and	
  queried	
  	
         }
                                                                            }



•  Offer	
  auto-­‐sharding	
  for	
  scaling	
  and	
  
   replica&on	
  for	
  high-­‐availability	
  
                                                                                                                     6	
  
COMPARING	
  
DATA	
  MODELS	
  




                     7	
  
h]p://www.geneontology.org/images/diag-­‐godb-­‐er.jpg	
     8	
  
Rela&onal	
  vs	
  Document	
  data	
  model	
  


            C1	
      C2	
       C3	
      C4	
  



                                                                      {	
       JSON	
  
                                                                      	
  
                                                                      	
  
                                                                              JSON	
  
                                                                      	
  
                                                                      }	
  
                                                                                  JSON	
  

     Rela)onal	
  data	
  model	
                             Document	
  data	
  model	
  
   Highly-­‐structured	
  table	
  organiza&on	
            Collec&on	
  of	
  complex	
  documents	
  with	
  
   with	
  rigidly-­‐defined	
  data	
  formats	
  and	
       arbitrary,	
  nested	
  data	
  formats	
  and	
  
                 record	
  structure.	
                            varying	
  “record”	
  format.	
  



                                                                                                                   9	
  
Example:	
  User	
  Profile	
  

                   User	
  Info	
                                                    Address	
  Info	
  
       KEY	
      First	
       Last	
     ZIP_id	
                     ZIP_id	
      CITY	
     STATE	
       ZIP	
  

         1	
      Dip)	
      Borkar	
        2	
                          1	
        DEN	
       CO	
       30303	
  



         2	
       Joe         Smith	
        2	
                          2	
         MV	
       CA	
       94040	
  
                    	
  


         3	
       Ali	
      Dodson	
        2	
                          3	
         CHI	
       IL	
      60609	
  



         4	
      John	
        Doe	
         3	
                          4	
         NY	
       NY	
       10010	
  




  To	
  get	
  informa)on	
  about	
  specific	
  user,	
  you	
  perform	
  a	
  join	
  across	
  two	
  tables	
  	
  


                                                                                                                           10	
  
Document	
  Example:	
  User	
  Profile	
  




	
  {	
  
	
  	
  	
  	
  “ID”:	
  1,	
  


                                            =	
                                     +	
  
	
  	
  	
  	
  “FIRST”:	
  “Dip)”,	
  
	
  	
  	
  	
  “LAST”:	
  “Borkar”,	
  
	
  	
  	
  	
  “ZIP”:	
  “94040”,	
  
	
  	
  	
  	
  “CITY”:	
  “MV”,	
  
	
  	
  	
  	
  “STATE”:	
  “CA”	
  
	
  	
  }	
  
                                 JSON	
  




                                             All	
  data	
  in	
  a	
  single	
  document	
  

                                                                                                11	
  
Making	
  a	
  Change	
  Using	
  RDBMS	
  
                   User	
  Table	
                                                               Photo	
  Table	
                                                 Country	
  Table	
  
                                                                     Country	
          TEL                                               Country	
  
User	
  ID	
      First	
                     Last	
       Zip	
       ID	
  
                                                                                    User	
  ID	
  
                                                                                       3	
  
                                                                                                     Photo	
  ID	
      Comment	
           ID	
             Country	
  ID	
     Country	
  name	
  

                                                                                         2	
           d043	
              NYC	
           	
  	
  001	
         001	
                   USA	
  
     1	
         Dip)	
            Borkar	
              94040	
      	
  001	
  
                                                                                         2	
           b054	
             Bday	
           	
  	
  007	
         002	
                              UK	
  
     2	
          Joe	
               Smith	
            94040	
       001	
             5	
           c036	
            Miami	
           	
  	
  001	
         003	
                Argen)na	
  
     3	
           Ali	
           Dodson	
              94040	
       001	
             7	
           d072	
            Sunset	
          	
  	
  133	
  
                                                                                                                                                                 004	
                Australia	
  
                                                                                     5002	
            e086	
             Spain	
          	
  	
  133	
  
     4	
         Sarah	
               Gorin	
            NW1	
        002	
                                                                                     005	
                  Aruba	
  
                                                                                                 Status	
  Table	
                                               006	
                 Austria	
  
     5	
          Bob	
            Young	
               30303	
       001	
                                                              Country	
  
                                                                                    User	
  ID	
     Status	
  ID	
         Text	
          ID	
  
                                                                                                                                                                 007	
                  Brazil	
  
     6	
         Nancy	
              Baker	
            10010	
       001	
             1	
             a42	
           At	
  conf	
      	
  	
  134	
  
                                                                                                                                                                 008	
                 Canada	
  
                                                                                         4	
             b26	
           excited	
          007	
  
     7	
          Ray	
                Jones	
           31311	
       001	
  
                                                                                         5	
             c32	
           hockey	
          	
  	
  008	
         009	
                   Chile	
  
     8	
          Lee	
                   Chen	
         V5V3M	
       008	
  
                                                                                        12	
             d83	
            Go	
  A’s	
      	
  	
  001	
                         • 
                                                                                                                                                                                 • 
                                                                                                                                                                                         	
  	
  
                                                                                                                                                                                         	
  	
  
                                                                                                                                                                                 •       	
  	
  

                                                                                     5000	
              e34	
            sailing	
        	
  	
  005	
  
                              •    	
  	
                                .	
  
                              •    	
  	
                                .	
                                                                                     130	
                Portugal	
  
                              •    	
  	
                                .	
               Affilia)ons	
  Table	
  
                                                                                                                                          Country	
  
                                                                                    User	
  ID	
       Affl	
  ID	
       Affl	
  Name	
        ID	
                 131	
                Romania	
  
50000	
          Doug	
            Moore	
               04252	
      001	
              2	
            a42	
               Cal	
          	
  	
  001	
         132	
                  Russia	
  
                                                                                         4	
            b96	
              USC	
           	
  	
  001	
  
50001	
          Mary	
             White	
              SW195	
      002	
                                                                                      133	
                  Spain	
  
                                                                                         7	
             c14	
              UW	
           	
  	
  001	
  
50002	
           Lisa	
                  Clark	
        12425	
      001	
              8	
            e22	
            Oxford	
          	
  	
  002	
         134	
                 Sweden	
  
                                                                                                                                                                                                             12	
  
Making	
  the	
  Same	
  Change	
  with	
  a	
  Document	
  Database	
  	
  



                             	
  {	
  
                             	
  	
  	
  	
  “ID”:	
  1,	
  
                             	
  	
  	
  	
  “FIRST”:	
  “Dip)”,	
  
                             	
  	
  	
  	
  “LAST”:	
  “Borkar”,	
  
                             	
  	
  	
  	
  “ZIP”:	
  “94040”,	
  
                             	
  	
  	
  	
  “CITY”:	
  “MV”,	
  
                             	
  	
  	
  	
  “STATE”:	
  “CA”,	
  
                             	
  	
  	
  	
  “STATUS”:	
  	
  
                                                                                           }	
  
                                                                                        ,	
  
                             	
  	
  	
  	
  	
  	
  {	
  	
  “TEXT”:	
  “At	
  Conf”	
  	
  
                             	
  }	
  	
  	
  	
  	
  	
  	
  “GEO_LOC”:	
  “134”	
  },	
  
                             	
   “COUNTRY”:	
  ”USA”	
  
                                  }	
                             	
               	
  	
  
                                                                                         JSON	
  




                Just	
  add	
  informa)on	
  to	
  a	
  document	
  
                                                                                                    13	
  
Document	
  modeling	
  

	
  
	
  
                     •    Are	
  these	
  separate	
  object	
  in	
  the	
  model	
  layer?	
  	
  
	
  
	
  
       Q	
           • 
                     • 
                          Are	
  these	
  objects	
  accessed	
  together?	
  	
  
                          Do	
  you	
  need	
  updates	
  to	
  these	
  objects	
  to	
  be	
  atomic?	
  
                     •    Are	
  mul&ple	
  	
  people	
  edi&ng	
  these	
  objects	
  concurrently?	
  	
  
	
  
               	
  When	
  considering	
  how	
  to	
  model	
  data	
  for	
  a	
  given	
  
               	
  applica&on	
  
               •  Think	
  of	
  a	
  logical	
  container	
  for	
  the	
  data	
  
               •  Think	
  of	
  how	
  data	
  groups	
  together 	
  	
  
        	
  
	
  
                                                                                                                14	
  
Document	
  Design	
  Op&ons	
  
      	
  	
  	
   	
  	
  
  •  One	
  document	
  that	
  contains	
  all	
  related	
  data	
  	
  	
  
      –  Data	
  is	
  de-­‐normalized	
  
      –  Be]er	
  performance	
  and	
  scale	
  
      –  Eliminate	
  client-­‐side	
  joins	
  	
  
      	
  
  •  Separate	
  documents	
  for	
  different	
  object	
  types	
  with	
  
     cross	
  references	
  	
  
      –  Data	
  duplica&on	
  is	
  reduced	
  
      –  Objects	
  may	
  not	
  be	
  co-­‐located	
  	
  
      –  Transac&ons	
  supported	
  only	
  on	
  a	
  document	
  boundary	
  
      –  Most	
  document	
  databases	
  do	
  not	
  support	
  joins	
  

                                                                                 15	
  
Document	
  ID	
  /	
  Key	
  selec&on	
  

 •    Similar	
  to	
  primary	
  keys	
  in	
  rela&onal	
  databases	
  
 •    Documents	
  are	
  sharded	
  based	
  on	
  the	
  document	
  ID	
  
 •    ID	
  based	
  document	
  lookup	
  is	
  extremely	
  fast	
  	
  
 •    Usually	
  an	
  ID	
  can	
  only	
  appear	
  once	
  in	
  a	
  bucket	
  
       	
  
       	
  
       	
  




 Q	
   	
       • 	
  	
  	
  	
  Do	
  you	
  have	
  a	
  unique	
  way	
  of	
  referencing	
  objects?	
  
                • 	
  	
  	
  	
  Are	
  related	
  objects	
  stored	
  in	
  separate	
  documents?	
  

 Op)ons	
  
       • UUIDs,	
  date-­‐based	
  IDs,	
  numeric	
  IDs	
  	
  	
  
       • Hand-­‐crajed	
  (human	
  readable)	
  	
  
       • Matching	
  prefixes	
  (for	
  mul&ple	
  related	
  objects)	
  

                                                                                                                 16	
  
Example:	
  En&&es	
  for	
  a	
  Blog	
  
                                                                      BLOG	
  
    •  User	
  profile	
  
       The	
  main	
  pointer	
  into	
  the	
  user	
  data	
  
         •  Blog	
  entries	
  
         •  Badge	
  sekngs,	
  like	
  a	
  twi]er	
  badge
               	
  	
  
                                                               	
  




    •  Blog	
  posts	
  
          Contains	
  the	
  blogs	
  themselves	
  	
  
            	
  




    •  Blog	
  comments	
  
       •  Comments	
  from	
  other	
  users	
  


                                                                                 17	
  
Blog	
  Document	
  –	
  Op&on	
  1	
  –	
  Single	
  document	
  	
  

       {	
  
       “UUID ”:	
  “2 1 f7 f8 de-­‐8 0 5 1 -­‐5 b89 -­‐8 6
       “Time”:	
   “2 0 1 1 -­‐0 4-­‐0 1 T1 3 :0 1 :0 2.4 2
 { “Server”:	
   “A2 2 2 3 E”,
       !
       “Calling	
   Server”:	
   “A2 2 1 3 W”,
 “_id”: “Couchbase_Hello_World”,!
       “Type”:	
   “E1 0 0 ”,
 “author”: “dborkar”, !
       “Initiating	
   Us er”:	
   “ds allings @s py.net”,
 “type”: “post”!
       “D etails ”:	
  
 “title”: “Hello World”,!
                {
 “format”: “IP”:	
  “1 0 .1 ! .2 2 ”,
                “markdown”, .1
                “API”:	
   “Ins ertD VD QueueItem”,
 “body”: “Hello from [Couchbase](http://couchbase.com).”, !
                “Trace”:	
   “cleans ed”,
 “html”: “<p>Hello from <a href=“http: …!
                “Tags ”:	
  
 “comments”:[ ! [
                     [“format”: “markdown”, “body”:”Awesome post!”],!
                         “SERVER”,	
  
                         “US-­‐Wes t”,	
  
                     [“format”: “markdown”, “body”:”Like it.” ]!
                   ]!    “API”
                          ]
 }	
  
                }
       }


                                                                         18	
  
Blog	
  Document	
  –	
  Op&on	
  2	
  -­‐	
  Split	
  into	
  mul&ple	
  docs	
  	
  

{	
  
{ !
“UUID ”:	
  “21f7f8de-­‐8051 -­‐5b89 -­‐86
“_id”: “Coucbase_Hello_World”,!
“Time”:	
   “2011 -­‐04-­‐01T13:01:02.42
“author”: “A2223E”, !
“Server”:	
  
                 “dborkar”,
“Calling	
   Server”:	
   “A2213W”,
“type”: “E100 ”,
“Type”:	
   “post”!
“title”: “Hello World”,! @s py.net”,
“Initiating	
   Us er”:	
   “ds allings
“D etails ”:	
  
“format”: “markdown”, !
         {
“body”:“IP”:	
  “10.1.1.22”,
             “Hello from [Couchbase](
         “API”:	
   “Ins ertDVD QueueItem”,
http://couchbase.com).”, !
         “Trace”:	
   “cleans ed”,
“html”:“Tags ”:	
  
             “<p>Hello from <a href=“http: …!
                   [
“comments”:[!      “SERVER”,	
  
       !          “comment1_Couchbase_Hello_world”!
                   “US-­‐Wes t”,	
  
       !           “API”
                  ]! ]      {	
  
                                                             COMMENT	
  
}!       }                  “UUID ”:	
  “ 2 1 f7 f8 d e-­‐ 8 0 5 1 -­‐5 b 8 9 -­‐ 8 6
                            “Time”:	
   “ 2 0 1 1 -­‐ 0 4 -­‐0 1 T1 3 :0 1 :0 2 .4 2
                            “Server”:	
   “A2 2 2 3 E”,
}                           “Callin g	
   Server”:	
   “A2 2 1 3 W ”,
                               {!
     BLOG	
  DOC	
  
                            “Typ e”:	
   “E1 0 0 ”,
                            “In itiatin g	
   Us er”:	
   “d s allin gs @s p y.n et”,

                               “_id”: “comment1_Couchbase_Hello_World”,!
                            “D etails ”:	
  
                                      {
                                      “IP ”:	
  “ 1 0 .1 .1 .2 2 ”,
                               “format”: “markdown”, !
                                      “AP I”:	
   “ In s ertD VD Qu eu eItem”,
                                      “Trace”:	
   “clean s ed ”,
                                      “Tags ”:	
  
                               “body”:”Awesome post!” !
                                                [
                                                “SERVER”,	
  
                                                “US-­‐Wes t”,	
  
                               }	
              “AP I”
                                                  ]
                                      }
                            }
                                                                                        19	
  
Threaded	
  Comments	
  

•  You	
  can	
  imagine	
  how	
  to	
  take	
  this	
  to	
  a	
  threaded	
  list	
  

                                          List	
        First	
  
                                                                                       Reply	
  to	
  
                                                        comment	
  
              Blog	
                                                        List	
     comment	
  



                                                           More	
  
                                                           Comments	
  
Advantages	
  
•  Only	
  fetch	
  the	
  data	
  when	
  you	
  need	
  it	
  
    •  For	
  example,	
  rendering	
  part	
  of	
  a	
  web	
  page	
  
•  Spread	
  the	
  data	
  and	
  load	
  across	
  the	
  en&re	
  cluster	
  	
  
                                                                                                         20	
  
COMPARING	
  	
  
SCALING	
  MODEL	
  




                       21	
  
Rela&onal	
  Technology	
  Scales	
  Up	
  
                                                                                  Applica)on	
  Scales	
  Out	
  
                                                                       Just	
  add	
  more	
  commodity	
  web	
  servers	
  

                                                                                 System	
  Cost	
  
                                                                                 Applica&on	
  Performance	
  	
  


Web/App	
  Server	
  Tier	
  




                                                                     Users	
  

                                                                                          RDBMS	
  Scales	
  Up	
  
                                                                           Get	
  a	
  bigger,	
  more	
  complex	
  server	
  

                                                                                 System	
  Cost	
  
                                                                                 Applica&on	
  Performance	
  	
  



                                                                                                                                  Won’t	
  
                                                                                                                                  scale	
  
                                                                                                                                  beyond	
  
                                                                                                                                  this	
  point	
  
                                Rela)onal	
  Database	
  
                                                                     Users	
  



                      Expensive	
  and	
  disrup)ve	
  sharding,	
  doesn’t	
  perform	
  at	
  web	
  scale	
  
                                                                                                                                                      22	
  
Couchbase	
  Server	
  Scales	
  Out	
  Like	
  App	
  Tier	
  
                                                                                        Applica)on	
  Scales	
  Out	
  
                                                                             Just	
  add	
  more	
  commodity	
  web	
  servers	
  

                                                                                       System	
  Cost	
  
                                                                                       Applica&on	
  Performance	
  	
  


Web/App	
  Server	
  Tier	
  




                                                                           Users	
  

                                                                                      NoSQL	
  Database	
  Scales	
  Out	
  
                                                                              Cost	
  and	
  performance	
  mirrors	
  app	
  )er	
  

                                                                                       System	
  Cost	
  
                                                                                       Applica&on	
  Performance	
  	
  


                       Couchbase	
  Distributed	
  Data	
  Store	
  




                                                                           Users	
  



                                   Scaling	
  out	
  flatens	
  the	
  cost	
  and	
  performance	
  curves	
  
                                                                                                                                        23	
  
Couchbase	
  Server	
  Admin	
  Console	
  




                                              24	
  
25	
  
WHERE	
  IS	
  NOSQL	
  A	
  GOOD	
  FIT?	
  




                                                26	
  
Performance	
  driven	
  use	
  cases	
  

     •  Low	
  latency	
  
     •  High	
  throughput	
  ma]ers	
  
     •  Large	
  number	
  of	
  users	
  	
  
     •  Unknown	
  demand	
  with	
  sudden	
  growth	
  of	
  
        users/data	
  	
  
     •  Predominantly	
  direct	
  document	
  access	
  
     •  Workloads	
  with	
  very	
  high	
  muta&on	
  rate	
  per	
  
        document	
  (temporal	
  locality)	
  Working	
  set	
  with	
  
        heavy	
  writes	
  	
  


                                                                           27	
  
Data	
  driven	
  use	
  cases	
  	
  

      •    Support	
  for	
  unlimited	
  data	
  growth	
  	
  	
  
      •    Data	
  with	
  non-­‐homogenous	
  structure	
  	
  
      •    Need	
  to	
  quickly	
  and	
  ojen	
  change	
  data	
  structure	
  
      •    3rd	
  party	
  or	
  user	
  defined	
  structure	
  
      •    Variable	
  length	
  documents	
  
      •    Sparse	
  data	
  records	
  
      •    Hierarchical	
  data	
  	
  




                                                                                     28	
  
Use	
  Case	
  Examples	
  

Web	
  app	
  or	
  Use-­‐case	
     Couchbase	
  Solu)on	
                                         Example	
  Customer	
  
Content	
  and	
  Metadata	
   Couchbase	
  document	
  store	
  +	
  Elas&c	
  Search	
            McGraw-­‐Hill…	
  
Management	
  System	
  
Social	
  Game	
  or	
  Mobile	
   Couchbase	
  stores	
  game	
  and	
  player	
  data	
           Zynga…	
  
App	
                               	
  

Ad	
  Targe)ng	
                     Couchbase	
  stores	
  user	
  informa&on	
  for	
  fast	
     AOL…	
  
                                     access	
  
User	
  Profile	
  Store	
            Couchbase	
  Server	
  as	
  a	
  key-­‐value	
  store	
       TuneWiki…	
  
                                       	
  


Session	
  Store	
                   Couchbase	
  Server	
  as	
  a	
  key-­‐value	
  store	
       Concur….	
  
                                       	
  


High	
  Availability	
  	
           Couchbase	
  Server	
  as	
  a	
  memcached	
  &er	
           Orbitz…	
  	
  
Caching	
  Tier	
                    replacement	
  
                                      	
  
Chat/Messaging	
                     Couchbase	
  Server	
                                          DOCOMO…	
  
Plauorm	
  

                                                                                                                              29	
  
Use	
  Case:	
  Social	
  Gaming	
  

 Social	
  and	
  Mobile	
  Gaming	
      Types	
  of	
  Data	
                             Applica)on	
  Requirements	
  
                                         •  User	
  account	
  informa&on	
              •  Ability	
  to	
  support	
  rapid	
  growth	
  
                                         •  User	
  game	
  profile	
  info	
             •  Fast	
  response	
  &mes	
  for	
  
                                         •  User’s	
  social	
  graph	
                     awesome	
  user	
  experience	
  
                                         •  State	
  of	
  the	
  game	
                 •  Game	
  up&me	
  –24x7x365	
  
                                         •  Player	
  badges	
  and	
  stats	
           •  Easy	
  to	
  update	
  apps	
  with	
  new	
  
                                                                                            features	
  	
  


                                          Why	
  NoSQL	
  and	
  Couchbase	
  	
  
                                         •  Scalability	
  ensures	
  that	
  games	
  are	
  ready	
  to	
  handle	
  the	
  millions	
  
                                            of	
  users	
  that	
  come	
  with	
  viral	
  growth.	
  	
  
                                         •  High	
  performance	
  guarantees	
  players	
  are	
  never	
  lej	
  wai&ng	
  to	
  
                                            make	
  their	
  next	
  move.	
  	
  
                                         •  Always-­‐on	
  opera&ons	
  means	
  zero	
  interrup&on	
  to	
  game	
  play	
  
                                            (and	
  revenue)	
  	
  
                                         •  Flexible	
  data	
  model	
  means	
  games	
  can	
  be	
  developed	
  rapidly	
  and	
  
                                            updated	
  easily	
  with	
  new	
  features	
  




                                                                                                                                              30	
  
Use	
  Case:	
  Ad	
  Targe&ng	
  

 Ad	
  Targe)ng	
             Types	
  of	
  Data	
                              Applica)on	
  Requirements	
  
                             •  User	
  profile:	
  preferences	
               •  High	
  performance	
  to	
  meet	
  
                                and	
  psychographic	
  data	
                    limited	
  ad	
  serving	
  budget;	
  &me	
  	
  
                             •  Ad	
  serving	
  history	
  by	
  user	
          allowance	
  is	
  typically	
  <40	
  msec	
  
                             •  Ad	
  buying	
  history	
  by	
                •  Scalability	
  to	
  handle	
  hundreds	
  
                                adver&ser	
  	
  	
                               of	
  millions	
  of	
  user	
  profiles	
  and	
  
                                                                                  rapidly	
  growing	
  amount	
  of	
  
                             •  Ad	
  serving	
  history	
  by	
                  data	
  
                                adver&ser	
  	
  
                                                                               •  24x7x365	
  availability	
  to	
  avoid	
  
                                                                                  ad	
  revenue	
  loss	
  
                              Why	
  NoSQL	
  and	
  Couchbase	
  	
  
                             •  Sub-­‐millisecond	
  reads/writes	
  means	
  less	
  &me	
  is	
  needed	
  for	
  data	
  
                                access,	
  more	
  &me	
  is	
  available	
  for	
  ad	
  logic	
  processing,	
  and	
  more	
  
                                highly	
  op&mized	
  ads	
  will	
  be	
  served	
  
                             •  Ease	
  of	
  scalability	
  ensures	
  that	
  the	
  data	
  cluster	
  can	
  be	
  grown	
  
                                seamlessly	
  as	
  the	
  amount	
  of	
  user	
  and	
  ad	
  data	
  grows	
  
                             •  Always-­‐on	
  opera&ons	
  =	
  always-­‐on	
  revenue.	
  You	
  will	
  never	
  miss	
  
                                the	
  opportunity	
  to	
  serve	
  an	
  ad	
  because	
  down&me.	
  




                                                                                                                                    31	
  
Use	
  Case:	
  Content	
  and	
  metadata	
  store	
  



                       Building	
  a	
  self-­‐adap&ng,	
  
                       interac&ve	
  learning	
  portal	
  with	
  
                       Couchbase	
  




                                                                 32	
  
The Problem	
  

      As learning move online in great numbers




      Growing need to build interactive learning environments that

                                                                                                                                               0101001001

      Scale!!                                                                                                                                  1101010101
                                                                                                                                               0101001010
                                                                                                                                               101010	
  


      Scale	
  to	
  millions	
  of	
     Serve	
  MHE	
  as	
  well	
  as	
  third-­‐party	
     Including	
           Support	
              Self-­‐adapt	
  via	
  
      learners	
                          content	
                                               open	
  content	
     learning	
  apps	
     usage	
  data	
  


                                                                                                                                                                         33	
  
The Challenge	
  


Hmmm...this	
  looks	
  kinda	
            Backend is an Interactive Content
like:	
  
+	
  Content	
  Caching	
  (Scale)	
       Delivery Cloud that must:
+	
  Social	
  Gaming	
  (Stats)	
  	
  
+	
  Ad	
  Targe<ng	
  (Smarts)	
          •  Allow	
  for	
  elastic   scaling	
  under	
  spike	
  periods	
  
                                           •  Ability	
  to	
  catalog	
  &	
  deliver	
  content	
  from	
  many     sources	
  
                                           •  Consistent	
  low-latency	
  for	
  metadata	
  and	
  stats	
  access	
  

                                           •  Require	
  full-text	
  search	
  support	
  for	
  content	
  discovery	
  

                                           •  Offer	
  tunable	
  content	
  ranking        & recommendation	
  
                                             func&ons	
  	
  


                                           Experimented with a combination of:

                                               XML	
  Databases	
               In-­‐memory	
  Data	
  Grids	
  

                                               SQL/MR	
  Engines	
              Enterprise	
  Search	
  Servers	
  


                                                                                                                             34	
  
The Technologies	
  




                       35	
  
The Learning Portal	
  


                             •    Designed and built as a
                                  collaboration between MHE Labs
                                  and Couchbase

                             •    Serves as proof-of-concept and
                                  testing harness for Couchbase +
                                  ElasticSearch integration

                             •    Available for download and further
                                  development as open source
                                  code




https://github.com/couchbaselabs/learningportal!
                                                                36	
  
BRIEF	
  OVERVIEW	
  
COUCHBASE	
  SERVER	
  




                          37	
  
Couchbase	
  Server	
  

  NoSQL	
  Distributed	
  Document	
  Database	
  
     for	
  interac)ve	
  web	
  applica)ons	
  


                      2.0


                                                     38	
  
Couchbase	
  Server	
  


                              Grow	
  cluster	
  without	
  
                 Easy	
  
                              applica)on	
  changes,	
  without	
  
            Scalability	
  
                              down)me	
  with	
  a	
  single	
  click	
  

                              Consistent	
  sub-­‐millisecond	
  	
  
    Consistent,	
  High	
     read	
  and	
  write	
  response	
  )mes	
  	
  
       Performance	
          with	
  consistent	
  high	
  throughput	
  


           Always	
  On	
     No	
  down)me	
  for	
  sowware	
  
            24x7x365	
        upgrades,	
  hardware	
  maintenance,	
  
                              etc.	
  




                                                                                 39	
  
Flexible	
  Data	
  Model	
  


                                     	
  {	
  
                                     	
  	
  	
  	
  “ID”:	
  1,	
  
                                     	
  	
  	
  	
  “FIRST”:	
  “Dip)”,	
  
                                     	
  	
  	
  	
  “LAST”:	
  “Borkar”,	
  
                                     	
  	
  	
  	
  “ZIP”:	
  “94040”,	
  
                                     	
  	
  	
  	
  “CITY”:	
  “MV”,	
  
                                     	
  	
  	
  	
  “STATE”:	
  “CA”	
  
                                      }	
                                                    JSON	
     JSON	
  
                                                                                  JSON	
  
                                                                       JSON	
  




         •  No	
  need	
  to	
  worry	
  about	
  the	
  database	
  when	
  changing	
  your	
  
            applica&on	
  
         •  Records	
  can	
  have	
  different	
  structures,	
  there	
  is	
  no	
  fixed	
  
            schema	
  
         •  Allows	
  painless	
  data	
  model	
  changes	
  for	
  rapid	
  applica&on	
  
            development	
  
                                                                                                                   40	
  
         	
  
COUCHBASE	
  SERVER	
  	
  
  ARCHITECTURE	
  




                              41	
  
Couchbase	
  Server	
  2.0	
  Architecture	
  
    8092	
                          11211	
                       11210	
  
    Query	
  API	
                  Memcapable	
  	
  1.0	
       Memcapable	
  	
  2.0	
  



                                        Moxi	
  
         Query	
  Engine	
  




                                                                                               REST	
  management	
  API/Web	
  UI	
  




                                                                                                                                                                                                                                                                                                                         vBucket	
  state	
  and	
  replica&on	
  manager	
  
                                                Memcached	
  




                                                                                                                                                                                                               Global	
  singleton	
  supervisor	
  


                                                                                                                                                                                                                                                        Rebalance	
  orchestrator	
  
                                                                                                                                                                                  Configura&on	
  manager	
  




                                                                                                                                                                                                                                                                                         Node	
  health	
  monitor	
  
                                                                                                                                                         Process	
  monitor	
  
                                                                                                                                         Heartbeat	
  
                                      Couchbase	
  EP	
  Engine	
  
                               Data	
  Manager	
                                                                         Cluster	
  Manager	
  
                                                                storage	
  interface	
  




                               New	
  Persistence	
  Layer	
                                  htp	
                                              on	
  each	
  node	
                                                                                  one	
  per	
  cluster	
  



                                                                                                                                                                            Erlang/OTP	
  



                                                                                              HTTP	
                                         Erlang	
  port	
  mapper	
                                                                                                                 Distributed	
  Erlang	
  
                                                                                              8091	
                                         4369	
                                                                                                                                     21100	
  -­‐	
  21199	
  
                                                                                                                                                                                                                                                                                                                                                                                42	
  
Couchbase	
  Server	
  2.0	
  Architecture	
  
    8092	
                          11211	
                       11210	
  
    Query	
  API	
                  Memcapable	
  	
  1.0	
       Memcapable	
  	
  2.0	
  



                                        Moxi	
  
         Query	
  Engine	
  




                                                                                               REST	
  management	
  API/Web	
  UI	
  




                                                                                                                                                                                                                                                                                                                         vBucket	
  state	
  and	
  replica&on	
  manager	
  
                                                Memcached	
  




                                                                                                                                                                                                               Global	
  singleton	
  supervisor	
  


                                                                                                                                                                                                                                                        Rebalance	
  orchestrator	
  
                                                                                                                                                                                  Configura&on	
  manager	
  




                                                                                                                                                                                                                                                                                         Node	
  health	
  monitor	
  
                                                                                                                                                         Process	
  monitor	
  
                                                                                                                                         Heartbeat	
  
                                      Couchbase	
  EP	
  Engine	
  

                                                                storage	
  interface	
  




                               New	
  Persistence	
  Layer	
                                  htp	
                                              on	
  each	
  node	
                                                                                  one	
  per	
  cluster	
  



                                                                                                                                                                            Erlang/OTP	
  



                                                                                              HTTP	
                                         Erlang	
  port	
  mapper	
                                                                                                                 Distributed	
  Erlang	
  
                                                                                              8091	
                                         4369	
                                                                                                                                     21100	
  -­‐	
  21199	
  
                                                                                                                                                                                                                                                                                                                                                                                43	
  
Couchbase	
  deployment	
  


                          Web	
  
                        Applica&on	
  

                        Couchbase	
  
                       Client	
  Library	
  

    Data	
  Flow	
  




                                               Cluster	
  Management	
  


                                                                           44	
  
Single	
  node	
  -­‐	
  Couchbase	
  Write	
  Opera&on	
  
                                                                                 2	
  

                                                                  Doc	
  1	
  
                                 App	
  Server	
  




                                                          3	
            2	
             3	
  
                                                         Managed	
  Cache	
  
     To	
  other	
  node	
     Replica&on	
  
                                                                  Doc	
  1	
  
                                 Queue	
  




                                                                                                 Disk	
  Queue	
  
                                           Disk	
  




                                                      Couchbase	
  Server	
  Node	
                                  45	
  
Single	
  node	
  -­‐	
  Couchbase	
  Update	
  Opera&on	
  
                                                                                2	
  

                                                                Doc	
  1’	
  
                                App	
  Server	
  




                                                         3	
            2	
             3	
  
                                                        Managed	
  Cache	
  
    To	
  other	
  node	
     Replica&on	
  
                                                                Doc	
  1	
  
                                                                Doc	
  1’	
  
                                Queue	
  




                                                                                                Disk	
  Queue	
  
                                          Disk	
  
                                                                 Doc	
  1	
  




                                                     Couchbase	
  Server	
  Node	
                                  46	
  
Single	
  node	
  -­‐	
  Couchbase	
  Read	
  Opera&on	
  
                                                                                 2	
  




                                                                  Doc	
  1	
  
                                                                   GET	
  
                                 App	
  Server	
  




                                                          3	
            2	
             3	
  
                                                         Managed	
  Cache	
  
     To	
  other	
  node	
     Replica&on	
  
                                 Queue	
                          Doc	
  1	
  




                                                                                                 Disk	
  Queue	
  
                                           Disk	
  
                                                                  Doc	
  1	
  




                                                      Couchbase	
  Server	
  Node	
                                  47	
  
Single	
  node	
  -­‐	
  Couchbase	
  Cache	
  Evic&on	
  
                                                                                                       2	
  

                                                                                 Doc	
  6	
  
                                                                                        2
                                                                                        3
                                                                                        4
                                                                                        5
                                 App	
  Server	
  




                                                                3	
            2	
                             3	
  
                                                               Managed	
  Cache	
  
     To	
  other	
  node	
     Replica&on	
  
                                 Queue	
                                         Doc	
  1	
  




                                                                                                                       Disk	
  Queue	
  
                                           Disk	
  
                                                                                 Doc	
  1	
  


                                            Doc	
  6	
   Doc	
  5	
   Doc	
  4	
   Doc	
  3	
   Doc	
  2	
  




                                                       Couchbase	
  Server	
  Node	
                                                       48	
  
Single	
  node	
  –	
  Couchbase	
  Cache	
  Miss	
  
                                                                                                             2	
  




                                                                                      Doc	
  1	
  
                                                                                       GET	
  
                                 App	
  Server	
  




                                                                 3	
            2	
                                       3	
  
                                                                Managed	
  Cache	
  
     To	
  other	
  node	
     Replica&on	
  
                                 Queue	
                                              Doc	
  1	
  
                                                           Doc	
  5	
   4	
   4	
  
                                                             Doc	
  
                                                                  Doc	
                              Doc	
  3	
   2	
  
                                                                                                       Doc	
  




                                                                                                                                  Disk	
  Queue	
  
                                           Disk	
  
                                                                                      Doc	
  1	
  


                                            Doc	
  6	
   Doc	
  5	
   Doc	
  4	
   Doc	
  3	
   Doc	
  2	
  




                                                       Couchbase	
  Server	
  Node	
                                                                  49	
  
Cluster	
  wide	
  -­‐	
  Basic	
  Opera&on	
  

                            APP	
  SERVER	
  1	
                                                APP	
  SERVER	
  2	
  
                   COUCHBASE	
  Client	
  Library	
                                      COUCHBASE	
  Client	
  Library	
  
                             	
                                                                    	
  
                       CLUSTER	
  MAP	
  
                             	
                                                              CLUSTER	
  MAP	
  
                                                                                                   	
  


                                          READ/WRITE/UPDATE	
  

                        SERVER	
  1	
  
                           	
                                       SERVER	
  2	
  
                                                                       	
                                    SERVER	
  3	
  
                                                                                                                	
                •  Docs	
  distributed	
  evenly	
  across	
  
                           	
  
                         ACTIVE	
  
                                                                       	
  
                                                                     ACTIVE	
  
                                                                                                                	
  
                                                                                                              ACTIVE	
  
                                                                                                                                     servers	
  	
  

                  Doc	
  5	
        Doc	
                      Doc	
  4	
      Doc	
                    Doc	
  1	
      Doc	
     •  Each	
  server	
  stores	
  both	
  ac)ve	
  and	
  
                                                                                                                                     replica	
  docs	
  
                  Doc	
  2	
        Doc	
                      Doc	
  7	
      Doc	
                    Doc	
  2	
      Doc	
  
                                                                                                                                     Only	
  one	
  server	
  ac&ve	
  at	
  a	
  &me	
  

                                                                                                                                  •  Client	
  library	
  provides	
  app	
  with	
  
                  Doc	
  9	
        Doc	
                      Doc	
  8	
      Doc	
                    Doc	
  6	
      Doc	
  
                                                                                                                                     simple	
  interface	
  to	
  database	
  
                        REPLICA	
                                    REPLICA	
                                REPLICA	
           •  Cluster	
  map	
  provides	
  map	
  	
  
                                                                                                                                     to	
  which	
  server	
  doc	
  is	
  on	
  
                  Doc	
  4	
        Doc	
                      Doc	
  6	
      Doc	
                    Doc	
  7	
      Doc	
        App	
  never	
  needs	
  to	
  know	
  

                  Doc	
  1	
        Doc	
                      Doc	
  3	
      Doc	
                    Doc	
  9	
      Doc	
     •  App	
  reads,	
  writes,	
  updates	
  docs	
  
                  Doc	
  8	
        Doc	
                      Doc	
  2	
      Doc	
                    Doc	
  5	
      Doc	
     •  Mul)ple	
  app	
  servers	
  can	
  access	
  same
                                                                                                                                     document	
  at	
  same	
  )me	
  
                                                      COUCHBASE	
  SERVER	
  	
  CLUSTER	
  


User	
  Configured	
  Replica	
  Count	
  =	
  1	
                                                                                                                                           50	
  
Cluster	
  wide	
  -­‐	
  Add	
  Nodes	
  to	
  Cluster	
  

                                   APP	
  SERVER	
  1	
                                                        APP	
  SERVER	
  2	
  
                            COUCHBASE	
  Client	
  Library	
                                             COUCHBASE	
  Client	
  Library	
  
                                      	
                                                                           	
  
                                CLUSTER	
  MAP	
  
                                      	
                                                                     CLUSTER	
  MAP	
  
                                                                                                                   	
  


                                                READ/WRITE/UPDATE	
                                                                 READ/WRITE/UPDATE	
  


          SERVER	
  1	
  
             	
                                       SERVER	
  2	
  
                                                         	
                            SERVER	
  3	
  
                                                                                          	
                              SERVER	
  4	
  
                                                                                                                             	
                  SERVER	
  5	
  
                                                                                                                                                    	
             •  Two	
  servers	
  added	
  
             	
  
           ACTIVE	
  
                                                         	
  
                                                       ACTIVE	
  
                                                                                          	
  
                                                                                        ACTIVE	
  
                                                                                                                             	
  
                                                                                                                           ACTIVE	
  
                                                                                                                                                    	
  
                                                                                                                                                  ACTIVE	
  
                                                                                                                                                                      One-­‐click	
  opera)on	
  

     Doc	
  5	
       Doc	
                   Doc	
  4	
         Doc	
            Doc	
  1	
      Doc	
                                                            •  Docs	
  automa)cally	
  
                                                                                                                                                                      rebalanced	
  across	
  
     Doc	
  2	
       Doc	
                   Doc	
  7	
         Doc	
            Doc	
  2	
      Doc	
                                                               cluster	
  
                                                                                                                                                                      Even	
  distribu&on	
  of	
  docs	
  
                                                                                                                                                                      Minimum	
  doc	
  movement	
  
     Doc	
  9	
       Doc	
                   Doc	
  8	
         Doc	
            Doc	
  6	
      Doc	
  
                                                                                                                                                                   •  Cluster	
  map	
  updated	
  
           REPLICA	
                                   REPLICA	
                        REPLICA	
                          REPLICA	
              REPLICA	
  
                                                                                                                                                                   •  App	
  database	
  	
  
     Doc	
  4	
       Doc	
                   Doc	
  6	
         Doc	
            Doc	
  7	
      Doc	
                                                               calls	
  now	
  distributed	
  	
  
                                                                                                                                                                      over	
  larger	
  number	
  of	
  
     Doc	
  1	
       Doc	
                   Doc	
  3	
         Doc	
            Doc	
  9	
      Doc	
  
                                                                                                                                                                      servers	
  
                                                                                                                                                                      	
  
     Doc	
  8	
       Doc	
                   Doc	
  2	
         Doc	
            Doc	
  5	
      Doc	
  


                                                                        COUCHBASE	
  SERVER	
  	
  CLUSTER	
  


User	
  Configured	
  Replica	
  Count	
  =	
  1	
                                                                                                                                                       51	
  
Cluster	
  wide	
  -­‐	
  Fail	
  Over	
  Node	
  

                                      APP	
  SERVER	
  1	
                                                     APP	
  SERVER	
  2	
  
                              COUCHBASE	
  Client	
  Library	
                                           COUCHBASE	
  Client	
  Library	
  
                                        	
                                                                         	
  
                                  CLUSTER	
  MAP	
  
                                        	
                                                                   CLUSTER	
  MAP	
  
                                                                                                                   	
  




            SERVER	
  1	
  
               	
                                     SERVER	
  2	
  
                                                         	
                            SERVER	
  3	
  
                                                                                          	
                                SERVER	
  4	
  
                                                                                                                               	
                     SERVER	
  5	
  
                                                                                                                                                         	
                •  App	
  servers	
  accessing	
  docs	
  
               	
                                        	
                               	
                                   	
                        	
  
             ACTIVE	
                                  ACTIVE	
                         ACTIVE	
                             ACTIVE	
                  ACTIVE	
  
                                                                                                                                                                           •  Requests	
  to	
  Server	
  3	
  fail	
  
       Doc	
  5	
       Doc	
                   Doc	
  4	
       Doc	
            Doc	
  1	
      Doc	
               Doc	
  9	
       Doc	
     Doc	
  6	
      Doc	
  
                                                                                                                                                                           •  Cluster	
  detects	
  server	
  failed	
  
                                                                                                                                                                              Promotes	
  replicas	
  of	
  docs	
  to	
  
       Doc	
  2	
       Doc	
                   Doc	
  7	
       Doc	
            Doc	
  2	
      Doc	
               Doc	
  8	
       Doc	
                     Doc	
        ac&ve	
  
                                                                                                                                                                              Updates	
  cluster	
  map	
  
       Doc	
  1	
                               Doc	
  3	
  
                                                                                                                                                                           •  Requests	
  for	
  docs	
  now	
  go	
  to	
  
             REPLICA	
                                 REPLICA	
                        REPLICA	
                            REPLICA	
                 REPLICA	
              appropriate	
  server	
  

       Doc	
  4	
       Doc	
                   Doc	
  6	
       Doc	
            Doc	
  7	
      Doc	
               Doc	
  5	
      Doc	
      Doc	
  8	
      Doc	
     •  Typically	
  rebalance	
  	
  
                                                                                                                                                                              would	
  follow	
  
       Doc	
  1	
       Doc	
                   Doc	
  3	
       Doc	
            Doc	
  9	
      Doc	
               Doc	
  2	
                                 Doc	
  




                                                                        COUCHBASE	
  SERVER	
  	
  CLUSTER	
  


User	
  Configured	
  Replica	
  Count	
  =	
  1	
                                                                                                                                                                            52	
  
Indexing	
  and	
  Querying	
  	
  

                            APP	
  SERVER	
  1	
                                               APP	
  SERVER	
  2	
  
                    COUCHBASE	
  Client	
  Library	
                                    COUCHBASE	
  Client	
  Library	
  
                              	
                                                                  	
  
                        CLUSTER	
  MAP	
  
                              	
                                                            CLUSTER	
  MAP	
  
                                                                                                  	
  



                                                                                                              Query     	
  
                  SERVER	
  1	
                                           SERVER	
  2	
                                          SERVER	
  3	
  
                                                                                                                                           	
      •  Indexing	
  work	
  is	
  distributed	
  
                  ACTIVE	
  
                              	
  
                                                                          ACTIVE	
  
                                                                                      	
  
                                                                                                                                ACTIVE	
  
                                                                                                                                           	
         amongst	
  nodes	
  

          Doc	
  5	
       Doc	
                                  Doc	
  5	
       Doc	
                                Doc	
  5	
       Doc	
     •  Large	
  data	
  set	
  possible	
  

          Doc	
  2	
       Doc	
                                  Doc	
  2	
       Doc	
                                Doc	
  2	
       Doc	
  
                                                                                                                                                   •  Parallelize	
  the	
  effort	
  

          Doc	
  9	
       Doc	
  
                                                                                                                                                   •  Each	
  node	
  has	
  index	
  for	
  data	
  stored
                                                                  Doc	
  9	
       Doc	
                                Doc	
  9	
       Doc	
  
                                                                                                                                                      on	
  it	
  
                REPLICA	
                                               REPLICA	
                                              REPLICA	
           •  Queries	
  combine	
  the	
  results	
  from	
  
          Doc	
  4	
       Doc	
  
                                                                                                                                                      required	
  nodes	
  
                                                                  Doc	
  4	
       Doc	
                                Doc	
  4	
      Doc	
  

          Doc	
  1	
       Doc	
                                  Doc	
  1	
       Doc	
                                Doc	
  1	
      Doc	
  

          Doc	
  8	
       Doc	
                                  Doc	
  8	
       Doc	
                                Doc	
  8	
      Doc	
  

                                                      COUCHBASE	
  SERVER	
  	
  CLUSTER	
  


User	
  Configured	
  Replica	
  Count	
  =	
  1	
                                                                                                                                                  53	
  
Cross	
  Data	
  Center	
  Replica&on	
  (XDCR)	
  
      SERVER	
  1	
  	
                           SERVER	
  2	
   	
                                                 SERVER	
  3	
   	
  
                       	
   ACTIVE	
                                 	
   ACTIVE	
                                                     	
   ACTIVE	
  
                                                                                                                                                                                  COUCHBASE	
  SERVER	
  	
  CLUSTER	
  
                    Doc	
  	
                                    Doc	
                                                              Doc	
  	
  
                                                                                                                                                                                       NY	
  DATA	
  CENTER	
  

                   Doc	
  2	
                                    Doc	
  	
                                                          Doc	
  	
  

                   Doc	
  9	
                                    Doc	
  	
                                                          Doc	
  
RAM	
                                        RAM	
                                                            RAM	
  


     Doc	
  	
       Doc	
  	
     Doc	
           Doc	
  	
       Doc	
          Doc	
  	
                          Doc	
          Doc	
           Doc	
  

                   DISK	
                                        DISK	
                                                          DISK	
  



                                                                                        SERVER	
  1	
  	
                                                SERVER	
  2	
   	
                                SERVER	
  3	
   	
  
                                                                                                         	
   ACTIVE	
                                                      	
   ACTIVE	
                                    	
   ACTIVE	
  

                                                                                                      Doc	
  	
                                                         Doc	
                                             Doc	
  	
  

                                                                                                     Doc	
  2	
                                                         Doc	
  	
                                         Doc	
  	
  

                                                                                                     Doc	
  9	
                                                         Doc	
  	
                                         Doc	
  
                                                                               RAM	
                                                              RAM	
                                               RAM	
  


           COUCHBASE	
  SERVER	
  	
  CLUSTER	
                                        Doc	
  	
       Doc	
  	
     Doc	
                               Doc	
  	
        Doc	
         Doc	
  	
          Doc	
          Doc	
         Doc	
  
                SF	
  DATA	
  CENTER	
  
                                                                                                     DISK	
                                                            DISK	
                                          DISK	
                  54	
  
THANK	
  YOU	
  
          	
  
          	
  
DIPTI@COUCHBASE.COM	
  
      @DBORKAR	
  



                          55	
  
56	
  
57	
  

Mais conteúdo relacionado

Mais procurados

3. Sql Services 概览
3. Sql Services 概览3. Sql Services 概览
3. Sql Services 概览
GaryYoung
 
MongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous DataMongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous Data
MongoDB
 
Databases That Support SharePoint 2013
Databases That Support SharePoint 2013Databases That Support SharePoint 2013
Databases That Support SharePoint 2013
David J Rosenthal
 

Mais procurados (12)

PassKit on iOS6
PassKit on iOS6PassKit on iOS6
PassKit on iOS6
 
3. Sql Services 概览
3. Sql Services 概览3. Sql Services 概览
3. Sql Services 概览
 
PassKit on iOS 6
PassKit on iOS 6PassKit on iOS 6
PassKit on iOS 6
 
MongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous DataMongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous Data
 
Vnsispl dbms concepts_ch1
Vnsispl dbms concepts_ch1Vnsispl dbms concepts_ch1
Vnsispl dbms concepts_ch1
 
VNSISPL_DBMS_Concepts_ch4
VNSISPL_DBMS_Concepts_ch4VNSISPL_DBMS_Concepts_ch4
VNSISPL_DBMS_Concepts_ch4
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
Oit2010 model databases
Oit2010 model databasesOit2010 model databases
Oit2010 model databases
 
VNSISPL_DBMS_Concepts_appA
VNSISPL_DBMS_Concepts_appAVNSISPL_DBMS_Concepts_appA
VNSISPL_DBMS_Concepts_appA
 
Databases That Support SharePoint 2013
Databases That Support SharePoint 2013Databases That Support SharePoint 2013
Databases That Support SharePoint 2013
 
Introduction To rNews 1.0
Introduction To rNews 1.0 Introduction To rNews 1.0
Introduction To rNews 1.0
 
orical
oricalorical
orical
 

Semelhante a Introduction to NoSQL and Couchbase

Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
sharonyb
 

Semelhante a Introduction to NoSQL and Couchbase (20)

Go simple-fast-elastic-with-couchbase-server-borkar
Go simple-fast-elastic-with-couchbase-server-borkarGo simple-fast-elastic-with-couchbase-server-borkar
Go simple-fast-elastic-with-couchbase-server-borkar
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
Otago vre-overview
Otago vre-overviewOtago vre-overview
Otago vre-overview
 
No Sql
No SqlNo Sql
No Sql
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlanta
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
Sql no sql
Sql no sqlSql no sql
Sql no sql
 
MongoDB @ SourceForge
MongoDB @ SourceForgeMongoDB @ SourceForge
MongoDB @ SourceForge
 
NoSql databases
NoSql databasesNoSql databases
NoSql databases
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
 
Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
 
Oracle no sql database bigdata
Oracle no sql database   bigdataOracle no sql database   bigdata
Oracle no sql database bigdata
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment Workshop04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment Workshop
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Weblogic Server
Weblogic ServerWeblogic Server
Weblogic Server
 

Mais de Dipti Borkar

Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013
Dipti Borkar
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databases
Dipti Borkar
 
Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013
Dipti Borkar
 
Part 2 of the webinar - Which freaking database should I use?
Part 2 of the webinar - Which freaking database should I use?Part 2 of the webinar - Which freaking database should I use?
Part 2 of the webinar - Which freaking database should I use?
Dipti Borkar
 

Mais de Dipti Borkar (16)

Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement Database
 
How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014How companies use NoSQL & Couchbase - NoSQL Now 2014
How companies use NoSQL & Couchbase - NoSQL Now 2014
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databases
 
How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013Launch webinar-introducing couchbase server 2.0-01202013
Launch webinar-introducing couchbase server 2.0-01202013
 
Part 2 of the webinar - Which freaking database should I use?
Part 2 of the webinar - Which freaking database should I use?Part 2 of the webinar - Which freaking database should I use?
Part 2 of the webinar - Which freaking database should I use?
 
Couchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveCouchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep dive
 
Couchbase Server 2.0 - Indexing and Querying - Deep dive
Couchbase Server 2.0 - Indexing and Querying - Deep diveCouchbase Server 2.0 - Indexing and Querying - Deep dive
Couchbase Server 2.0 - Indexing and Querying - Deep dive
 
Introduction to Couchbase Server 2.0
Introduction to Couchbase Server 2.0Introduction to Couchbase Server 2.0
Introduction to Couchbase Server 2.0
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
Introduction to Couchbase Server 2.0 - CouchConf SF - Tour and Demo
Introduction to Couchbase Server 2.0 - CouchConf SF - Tour and DemoIntroduction to Couchbase Server 2.0 - CouchConf SF - Tour and Demo
Introduction to Couchbase Server 2.0 - CouchConf SF - Tour and Demo
 

Introduction to NoSQL and Couchbase

  • 1. Introduc)on  to  NoSQL     and   Couchbase   Dip&  Borkar   Director,  Product  Management   1  
  • 2. WHY  TRANSITION  TO  NOSQL?     2  
  • 3. Two  big  drivers  for  NoSQL  adop&on   49%   35%   29%   16%   12%   11%   Lack  of  flexibility/   Inability  to   Performance   Cost   All  of  these   Other   rigid  schemas   scale  out  data   challenges   Source:  Couchbase  Survey,  December  2011,  n  =  1351.   3  
  • 4. NoSQL  catalog   Key-­‐Value   Data  Structure   Document   Column   Graph   (memory  only)   Cache   memcached   redis   (memory/disk)   membase   couchbase   cassandra   Neo4j   Database   mongoDB   4  
  • 5. DISTRIBUTED  DOCUMENT   DATABASES   5  
  • 6. Document  Databases   •  Each  record  in  the  database  is  a  self-­‐ describing  document     {   •  Each  document  has  an  independent   “UUID”:  “ 21f7f8de-­‐8051-­‐5b89-­‐86 “Time”:   “2011-­‐04-­‐01T13:01:02.42 “Server”:   “A2223E”, structure   “Calling   Server”:   “A2213W”, “Type”:   “E100”, “Initiating   User”:   “dsallings@spy.net”, •  Documents  can  be  complex     “Details”:   { “IP”:  “ 10.1.1.22”, •  All  databases  require  a  unique  key   “API”:   “InsertDVDQueueItem”, “Trace”:   “cleansed”, •  Documents  are  stored  using  JSON  or   “Tags”:   [ “SERVER”,   XML  or  their  deriva&ves   “US-­‐West”,   “API” ] •  Content  can  be  indexed  and  queried     } } •  Offer  auto-­‐sharding  for  scaling  and   replica&on  for  high-­‐availability   6  
  • 9. Rela&onal  vs  Document  data  model   C1   C2   C3   C4   {   JSON       JSON     }   JSON   Rela)onal  data  model   Document  data  model   Highly-­‐structured  table  organiza&on   Collec&on  of  complex  documents  with   with  rigidly-­‐defined  data  formats  and   arbitrary,  nested  data  formats  and   record  structure.   varying  “record”  format.   9  
  • 10. Example:  User  Profile   User  Info   Address  Info   KEY   First   Last   ZIP_id   ZIP_id   CITY   STATE   ZIP   1   Dip)   Borkar   2   1   DEN   CO   30303   2   Joe Smith   2   2   MV   CA   94040     3   Ali   Dodson   2   3   CHI   IL   60609   4   John   Doe   3   4   NY   NY   10010   To  get  informa)on  about  specific  user,  you  perform  a  join  across  two  tables     10  
  • 11. Document  Example:  User  Profile    {          “ID”:  1,   =   +          “FIRST”:  “Dip)”,          “LAST”:  “Borkar”,          “ZIP”:  “94040”,          “CITY”:  “MV”,          “STATE”:  “CA”      }   JSON   All  data  in  a  single  document   11  
  • 12. Making  a  Change  Using  RDBMS   User  Table   Photo  Table   Country  Table   Country   TEL Country   User  ID   First   Last   Zip   ID   User  ID   3   Photo  ID   Comment   ID   Country  ID   Country  name   2   d043   NYC      001   001   USA   1   Dip)   Borkar   94040    001   2   b054   Bday      007   002   UK   2   Joe   Smith   94040   001   5   c036   Miami      001   003   Argen)na   3   Ali   Dodson   94040   001   7   d072   Sunset      133   004   Australia   5002   e086   Spain      133   4   Sarah   Gorin   NW1   002   005   Aruba   Status  Table   006   Austria   5   Bob   Young   30303   001   Country   User  ID   Status  ID   Text   ID   007   Brazil   6   Nancy   Baker   10010   001   1   a42   At  conf      134   008   Canada   4   b26   excited   007   7   Ray   Jones   31311   001   5   c32   hockey      008   009   Chile   8   Lee   Chen   V5V3M   008   12   d83   Go  A’s      001   •  •          •      5000   e34   sailing      005   •      .   •      .   130   Portugal   •      .   Affilia)ons  Table   Country   User  ID   Affl  ID   Affl  Name   ID   131   Romania   50000   Doug   Moore   04252   001   2   a42   Cal      001   132   Russia   4   b96   USC      001   50001   Mary   White   SW195   002   133   Spain   7   c14   UW      001   50002   Lisa   Clark   12425   001   8   e22   Oxford      002   134   Sweden   12  
  • 13. Making  the  Same  Change  with  a  Document  Database      {          “ID”:  1,          “FIRST”:  “Dip)”,          “LAST”:  “Borkar”,          “ZIP”:  “94040”,          “CITY”:  “MV”,          “STATE”:  “CA”,          “STATUS”:     }   ,              {    “TEXT”:  “At  Conf”      }              “GEO_LOC”:  “134”  },     “COUNTRY”:  ”USA”   }         JSON   Just  add  informa)on  to  a  document   13  
  • 14. Document  modeling       •  Are  these  separate  object  in  the  model  layer?         Q   •  •  Are  these  objects  accessed  together?     Do  you  need  updates  to  these  objects  to  be  atomic?   •  Are  mul&ple    people  edi&ng  these  objects  concurrently?        When  considering  how  to  model  data  for  a  given    applica&on   •  Think  of  a  logical  container  for  the  data   •  Think  of  how  data  groups  together         14  
  • 15. Document  Design  Op&ons             •  One  document  that  contains  all  related  data       –  Data  is  de-­‐normalized   –  Be]er  performance  and  scale   –  Eliminate  client-­‐side  joins       •  Separate  documents  for  different  object  types  with   cross  references     –  Data  duplica&on  is  reduced   –  Objects  may  not  be  co-­‐located     –  Transac&ons  supported  only  on  a  document  boundary   –  Most  document  databases  do  not  support  joins   15  
  • 16. Document  ID  /  Key  selec&on   •  Similar  to  primary  keys  in  rela&onal  databases   •  Documents  are  sharded  based  on  the  document  ID   •  ID  based  document  lookup  is  extremely  fast     •  Usually  an  ID  can  only  appear  once  in  a  bucket         Q     •         Do  you  have  a  unique  way  of  referencing  objects?   •         Are  related  objects  stored  in  separate  documents?   Op)ons   • UUIDs,  date-­‐based  IDs,  numeric  IDs       • Hand-­‐crajed  (human  readable)     • Matching  prefixes  (for  mul&ple  related  objects)   16  
  • 17. Example:  En&&es  for  a  Blog   BLOG   •  User  profile   The  main  pointer  into  the  user  data   •  Blog  entries   •  Badge  sekngs,  like  a  twi]er  badge       •  Blog  posts   Contains  the  blogs  themselves       •  Blog  comments   •  Comments  from  other  users   17  
  • 18. Blog  Document  –  Op&on  1  –  Single  document     {   “UUID ”:  “2 1 f7 f8 de-­‐8 0 5 1 -­‐5 b89 -­‐8 6 “Time”:   “2 0 1 1 -­‐0 4-­‐0 1 T1 3 :0 1 :0 2.4 2 { “Server”:   “A2 2 2 3 E”, ! “Calling   Server”:   “A2 2 1 3 W”, “_id”: “Couchbase_Hello_World”,! “Type”:   “E1 0 0 ”, “author”: “dborkar”, ! “Initiating   Us er”:   “ds allings @s py.net”, “type”: “post”! “D etails ”:   “title”: “Hello World”,! { “format”: “IP”:  “1 0 .1 ! .2 2 ”, “markdown”, .1 “API”:   “Ins ertD VD QueueItem”, “body”: “Hello from [Couchbase](http://couchbase.com).”, ! “Trace”:   “cleans ed”, “html”: “<p>Hello from <a href=“http: …! “Tags ”:   “comments”:[ ! [ [“format”: “markdown”, “body”:”Awesome post!”],! “SERVER”,   “US-­‐Wes t”,   [“format”: “markdown”, “body”:”Like it.” ]! ]! “API” ] }   } } 18  
  • 19. Blog  Document  –  Op&on  2  -­‐  Split  into  mul&ple  docs     {   { ! “UUID ”:  “21f7f8de-­‐8051 -­‐5b89 -­‐86 “_id”: “Coucbase_Hello_World”,! “Time”:   “2011 -­‐04-­‐01T13:01:02.42 “author”: “A2223E”, ! “Server”:   “dborkar”, “Calling   Server”:   “A2213W”, “type”: “E100 ”, “Type”:   “post”! “title”: “Hello World”,! @s py.net”, “Initiating   Us er”:   “ds allings “D etails ”:   “format”: “markdown”, ! { “body”:“IP”:  “10.1.1.22”, “Hello from [Couchbase]( “API”:   “Ins ertDVD QueueItem”, http://couchbase.com).”, ! “Trace”:   “cleans ed”, “html”:“Tags ”:   “<p>Hello from <a href=“http: …! [ “comments”:[! “SERVER”,   ! “comment1_Couchbase_Hello_world”! “US-­‐Wes t”,   ! “API” ]! ] {   COMMENT   }! } “UUID ”:  “ 2 1 f7 f8 d e-­‐ 8 0 5 1 -­‐5 b 8 9 -­‐ 8 6 “Time”:   “ 2 0 1 1 -­‐ 0 4 -­‐0 1 T1 3 :0 1 :0 2 .4 2 “Server”:   “A2 2 2 3 E”, } “Callin g   Server”:   “A2 2 1 3 W ”, {! BLOG  DOC   “Typ e”:   “E1 0 0 ”, “In itiatin g   Us er”:   “d s allin gs @s p y.n et”, “_id”: “comment1_Couchbase_Hello_World”,! “D etails ”:   { “IP ”:  “ 1 0 .1 .1 .2 2 ”, “format”: “markdown”, ! “AP I”:   “ In s ertD VD Qu eu eItem”, “Trace”:   “clean s ed ”, “Tags ”:   “body”:”Awesome post!” ! [ “SERVER”,   “US-­‐Wes t”,   }   “AP I” ] } } 19  
  • 20. Threaded  Comments   •  You  can  imagine  how  to  take  this  to  a  threaded  list   List   First   Reply  to   comment   Blog   List   comment   More   Comments   Advantages   •  Only  fetch  the  data  when  you  need  it   •  For  example,  rendering  part  of  a  web  page   •  Spread  the  data  and  load  across  the  en&re  cluster     20  
  • 21. COMPARING     SCALING  MODEL   21  
  • 22. Rela&onal  Technology  Scales  Up   Applica)on  Scales  Out   Just  add  more  commodity  web  servers   System  Cost   Applica&on  Performance     Web/App  Server  Tier   Users   RDBMS  Scales  Up   Get  a  bigger,  more  complex  server   System  Cost   Applica&on  Performance     Won’t   scale   beyond   this  point   Rela)onal  Database   Users   Expensive  and  disrup)ve  sharding,  doesn’t  perform  at  web  scale   22  
  • 23. Couchbase  Server  Scales  Out  Like  App  Tier   Applica)on  Scales  Out   Just  add  more  commodity  web  servers   System  Cost   Applica&on  Performance     Web/App  Server  Tier   Users   NoSQL  Database  Scales  Out   Cost  and  performance  mirrors  app  )er   System  Cost   Applica&on  Performance     Couchbase  Distributed  Data  Store   Users   Scaling  out  flatens  the  cost  and  performance  curves   23  
  • 24. Couchbase  Server  Admin  Console   24  
  • 25. 25  
  • 26. WHERE  IS  NOSQL  A  GOOD  FIT?   26  
  • 27. Performance  driven  use  cases   •  Low  latency   •  High  throughput  ma]ers   •  Large  number  of  users     •  Unknown  demand  with  sudden  growth  of   users/data     •  Predominantly  direct  document  access   •  Workloads  with  very  high  muta&on  rate  per   document  (temporal  locality)  Working  set  with   heavy  writes     27  
  • 28. Data  driven  use  cases     •  Support  for  unlimited  data  growth       •  Data  with  non-­‐homogenous  structure     •  Need  to  quickly  and  ojen  change  data  structure   •  3rd  party  or  user  defined  structure   •  Variable  length  documents   •  Sparse  data  records   •  Hierarchical  data     28  
  • 29. Use  Case  Examples   Web  app  or  Use-­‐case   Couchbase  Solu)on   Example  Customer   Content  and  Metadata   Couchbase  document  store  +  Elas&c  Search   McGraw-­‐Hill…   Management  System   Social  Game  or  Mobile   Couchbase  stores  game  and  player  data   Zynga…   App     Ad  Targe)ng   Couchbase  stores  user  informa&on  for  fast   AOL…   access   User  Profile  Store   Couchbase  Server  as  a  key-­‐value  store   TuneWiki…     Session  Store   Couchbase  Server  as  a  key-­‐value  store   Concur….     High  Availability     Couchbase  Server  as  a  memcached  &er   Orbitz…     Caching  Tier   replacement     Chat/Messaging   Couchbase  Server   DOCOMO…   Plauorm   29  
  • 30. Use  Case:  Social  Gaming   Social  and  Mobile  Gaming   Types  of  Data   Applica)on  Requirements   •  User  account  informa&on   •  Ability  to  support  rapid  growth   •  User  game  profile  info   •  Fast  response  &mes  for   •  User’s  social  graph   awesome  user  experience   •  State  of  the  game   •  Game  up&me  –24x7x365   •  Player  badges  and  stats   •  Easy  to  update  apps  with  new   features     Why  NoSQL  and  Couchbase     •  Scalability  ensures  that  games  are  ready  to  handle  the  millions   of  users  that  come  with  viral  growth.     •  High  performance  guarantees  players  are  never  lej  wai&ng  to   make  their  next  move.     •  Always-­‐on  opera&ons  means  zero  interrup&on  to  game  play   (and  revenue)     •  Flexible  data  model  means  games  can  be  developed  rapidly  and   updated  easily  with  new  features   30  
  • 31. Use  Case:  Ad  Targe&ng   Ad  Targe)ng   Types  of  Data   Applica)on  Requirements   •  User  profile:  preferences   •  High  performance  to  meet   and  psychographic  data   limited  ad  serving  budget;  &me     •  Ad  serving  history  by  user   allowance  is  typically  <40  msec   •  Ad  buying  history  by   •  Scalability  to  handle  hundreds   adver&ser       of  millions  of  user  profiles  and   rapidly  growing  amount  of   •  Ad  serving  history  by   data   adver&ser     •  24x7x365  availability  to  avoid   ad  revenue  loss   Why  NoSQL  and  Couchbase     •  Sub-­‐millisecond  reads/writes  means  less  &me  is  needed  for  data   access,  more  &me  is  available  for  ad  logic  processing,  and  more   highly  op&mized  ads  will  be  served   •  Ease  of  scalability  ensures  that  the  data  cluster  can  be  grown   seamlessly  as  the  amount  of  user  and  ad  data  grows   •  Always-­‐on  opera&ons  =  always-­‐on  revenue.  You  will  never  miss   the  opportunity  to  serve  an  ad  because  down&me.   31  
  • 32. Use  Case:  Content  and  metadata  store   Building  a  self-­‐adap&ng,   interac&ve  learning  portal  with   Couchbase   32  
  • 33. The Problem   As learning move online in great numbers Growing need to build interactive learning environments that 0101001001 Scale!! 1101010101 0101001010 101010   Scale  to  millions  of   Serve  MHE  as  well  as  third-­‐party   Including   Support   Self-­‐adapt  via   learners   content   open  content   learning  apps   usage  data   33  
  • 34. The Challenge   Hmmm...this  looks  kinda   Backend is an Interactive Content like:   +  Content  Caching  (Scale)   Delivery Cloud that must: +  Social  Gaming  (Stats)     +  Ad  Targe<ng  (Smarts)   •  Allow  for  elastic scaling  under  spike  periods   •  Ability  to  catalog  &  deliver  content  from  many sources   •  Consistent  low-latency  for  metadata  and  stats  access   •  Require  full-text  search  support  for  content  discovery   •  Offer  tunable  content  ranking & recommendation   func&ons     Experimented with a combination of: XML  Databases   In-­‐memory  Data  Grids   SQL/MR  Engines   Enterprise  Search  Servers   34  
  • 36. The Learning Portal   •  Designed and built as a collaboration between MHE Labs and Couchbase •  Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration •  Available for download and further development as open source code https://github.com/couchbaselabs/learningportal! 36  
  • 37. BRIEF  OVERVIEW   COUCHBASE  SERVER   37  
  • 38. Couchbase  Server   NoSQL  Distributed  Document  Database   for  interac)ve  web  applica)ons   2.0 38  
  • 39. Couchbase  Server   Grow  cluster  without   Easy   applica)on  changes,  without   Scalability   down)me  with  a  single  click   Consistent  sub-­‐millisecond     Consistent,  High   read  and  write  response  )mes     Performance   with  consistent  high  throughput   Always  On   No  down)me  for  sowware   24x7x365   upgrades,  hardware  maintenance,   etc.   39  
  • 40. Flexible  Data  Model    {          “ID”:  1,          “FIRST”:  “Dip)”,          “LAST”:  “Borkar”,          “ZIP”:  “94040”,          “CITY”:  “MV”,          “STATE”:  “CA”   }   JSON   JSON   JSON   JSON   •  No  need  to  worry  about  the  database  when  changing  your   applica&on   •  Records  can  have  different  structures,  there  is  no  fixed   schema   •  Allows  painless  data  model  changes  for  rapid  applica&on   development   40    
  • 41. COUCHBASE  SERVER     ARCHITECTURE   41  
  • 42. Couchbase  Server  2.0  Architecture   8092   11211   11210   Query  API   Memcapable    1.0   Memcapable    2.0   Moxi   Query  Engine   REST  management  API/Web  UI   vBucket  state  and  replica&on  manager   Memcached   Global  singleton  supervisor   Rebalance  orchestrator   Configura&on  manager   Node  health  monitor   Process  monitor   Heartbeat   Couchbase  EP  Engine   Data  Manager   Cluster  Manager   storage  interface   New  Persistence  Layer   htp   on  each  node   one  per  cluster   Erlang/OTP   HTTP   Erlang  port  mapper   Distributed  Erlang   8091   4369   21100  -­‐  21199   42  
  • 43. Couchbase  Server  2.0  Architecture   8092   11211   11210   Query  API   Memcapable    1.0   Memcapable    2.0   Moxi   Query  Engine   REST  management  API/Web  UI   vBucket  state  and  replica&on  manager   Memcached   Global  singleton  supervisor   Rebalance  orchestrator   Configura&on  manager   Node  health  monitor   Process  monitor   Heartbeat   Couchbase  EP  Engine   storage  interface   New  Persistence  Layer   htp   on  each  node   one  per  cluster   Erlang/OTP   HTTP   Erlang  port  mapper   Distributed  Erlang   8091   4369   21100  -­‐  21199   43  
  • 44. Couchbase  deployment   Web   Applica&on   Couchbase   Client  Library   Data  Flow   Cluster  Management   44  
  • 45. Single  node  -­‐  Couchbase  Write  Opera&on   2   Doc  1   App  Server   3   2   3   Managed  Cache   To  other  node   Replica&on   Doc  1   Queue   Disk  Queue   Disk   Couchbase  Server  Node   45  
  • 46. Single  node  -­‐  Couchbase  Update  Opera&on   2   Doc  1’   App  Server   3   2   3   Managed  Cache   To  other  node   Replica&on   Doc  1   Doc  1’   Queue   Disk  Queue   Disk   Doc  1   Couchbase  Server  Node   46  
  • 47. Single  node  -­‐  Couchbase  Read  Opera&on   2   Doc  1   GET   App  Server   3   2   3   Managed  Cache   To  other  node   Replica&on   Queue   Doc  1   Disk  Queue   Disk   Doc  1   Couchbase  Server  Node   47  
  • 48. Single  node  -­‐  Couchbase  Cache  Evic&on   2   Doc  6   2 3 4 5 App  Server   3   2   3   Managed  Cache   To  other  node   Replica&on   Queue   Doc  1   Disk  Queue   Disk   Doc  1   Doc  6   Doc  5   Doc  4   Doc  3   Doc  2   Couchbase  Server  Node   48  
  • 49. Single  node  –  Couchbase  Cache  Miss   2   Doc  1   GET   App  Server   3   2   3   Managed  Cache   To  other  node   Replica&on   Queue   Doc  1   Doc  5   4   4   Doc   Doc   Doc  3   2   Doc   Disk  Queue   Disk   Doc  1   Doc  6   Doc  5   Doc  4   Doc  3   Doc  2   Couchbase  Server  Node   49  
  • 50. Cluster  wide  -­‐  Basic  Opera&on   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     READ/WRITE/UPDATE   SERVER  1     SERVER  2     SERVER  3     •  Docs  distributed  evenly  across     ACTIVE     ACTIVE     ACTIVE   servers     Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   •  Each  server  stores  both  ac)ve  and   replica  docs   Doc  2   Doc   Doc  7   Doc   Doc  2   Doc   Only  one  server  ac&ve  at  a  &me   •  Client  library  provides  app  with   Doc  9   Doc   Doc  8   Doc   Doc  6   Doc   simple  interface  to  database   REPLICA   REPLICA   REPLICA   •  Cluster  map  provides  map     to  which  server  doc  is  on   Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   App  never  needs  to  know   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   •  App  reads,  writes,  updates  docs   Doc  8   Doc   Doc  2   Doc   Doc  5   Doc   •  Mul)ple  app  servers  can  access  same document  at  same  )me   COUCHBASE  SERVER    CLUSTER   User  Configured  Replica  Count  =  1   50  
  • 51. Cluster  wide  -­‐  Add  Nodes  to  Cluster   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     READ/WRITE/UPDATE   READ/WRITE/UPDATE   SERVER  1     SERVER  2     SERVER  3     SERVER  4     SERVER  5     •  Two  servers  added     ACTIVE     ACTIVE     ACTIVE     ACTIVE     ACTIVE   One-­‐click  opera)on   Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   •  Docs  automa)cally   rebalanced  across   Doc  2   Doc   Doc  7   Doc   Doc  2   Doc   cluster   Even  distribu&on  of  docs   Minimum  doc  movement   Doc  9   Doc   Doc  8   Doc   Doc  6   Doc   •  Cluster  map  updated   REPLICA   REPLICA   REPLICA   REPLICA   REPLICA   •  App  database     Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   calls  now  distributed     over  larger  number  of   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   servers     Doc  8   Doc   Doc  2   Doc   Doc  5   Doc   COUCHBASE  SERVER    CLUSTER   User  Configured  Replica  Count  =  1   51  
  • 52. Cluster  wide  -­‐  Fail  Over  Node   APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     SERVER  1     SERVER  2     SERVER  3     SERVER  4     SERVER  5     •  App  servers  accessing  docs             ACTIVE   ACTIVE   ACTIVE   ACTIVE   ACTIVE   •  Requests  to  Server  3  fail   Doc  5   Doc   Doc  4   Doc   Doc  1   Doc   Doc  9   Doc   Doc  6   Doc   •  Cluster  detects  server  failed   Promotes  replicas  of  docs  to   Doc  2   Doc   Doc  7   Doc   Doc  2   Doc   Doc  8   Doc   Doc   ac&ve   Updates  cluster  map   Doc  1   Doc  3   •  Requests  for  docs  now  go  to   REPLICA   REPLICA   REPLICA   REPLICA   REPLICA   appropriate  server   Doc  4   Doc   Doc  6   Doc   Doc  7   Doc   Doc  5   Doc   Doc  8   Doc   •  Typically  rebalance     would  follow   Doc  1   Doc   Doc  3   Doc   Doc  9   Doc   Doc  2   Doc   COUCHBASE  SERVER    CLUSTER   User  Configured  Replica  Count  =  1   52  
  • 53. Indexing  and  Querying     APP  SERVER  1   APP  SERVER  2   COUCHBASE  Client  Library   COUCHBASE  Client  Library       CLUSTER  MAP     CLUSTER  MAP     Query   SERVER  1   SERVER  2   SERVER  3     •  Indexing  work  is  distributed   ACTIVE     ACTIVE     ACTIVE     amongst  nodes   Doc  5   Doc   Doc  5   Doc   Doc  5   Doc   •  Large  data  set  possible   Doc  2   Doc   Doc  2   Doc   Doc  2   Doc   •  Parallelize  the  effort   Doc  9   Doc   •  Each  node  has  index  for  data  stored Doc  9   Doc   Doc  9   Doc   on  it   REPLICA   REPLICA   REPLICA   •  Queries  combine  the  results  from   Doc  4   Doc   required  nodes   Doc  4   Doc   Doc  4   Doc   Doc  1   Doc   Doc  1   Doc   Doc  1   Doc   Doc  8   Doc   Doc  8   Doc   Doc  8   Doc   COUCHBASE  SERVER    CLUSTER   User  Configured  Replica  Count  =  1   53  
  • 54. Cross  Data  Center  Replica&on  (XDCR)   SERVER  1     SERVER  2     SERVER  3       ACTIVE     ACTIVE     ACTIVE   COUCHBASE  SERVER    CLUSTER   Doc     Doc   Doc     NY  DATA  CENTER   Doc  2   Doc     Doc     Doc  9   Doc     Doc   RAM   RAM   RAM   Doc     Doc     Doc   Doc     Doc   Doc     Doc   Doc   Doc   DISK   DISK   DISK   SERVER  1     SERVER  2     SERVER  3       ACTIVE     ACTIVE     ACTIVE   Doc     Doc   Doc     Doc  2   Doc     Doc     Doc  9   Doc     Doc   RAM   RAM   RAM   COUCHBASE  SERVER    CLUSTER   Doc     Doc     Doc   Doc     Doc   Doc     Doc   Doc   Doc   SF  DATA  CENTER   DISK   DISK   DISK   54  
  • 55. THANK  YOU       DIPTI@COUCHBASE.COM   @DBORKAR   55  
  • 56. 56  
  • 57. 57