SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Ne#lix	
  Cloud	
  Architecture	
  

        Qcon	
  Beijing	
  April	
  9,	
  2011	
  
           Adrian	
  Cockcro>	
  
  @adrianco	
  #ne#lixcloud	
  hAp://slideshare.net/adrianco	
  
                acockcro>@ne#lix.com	
  
(ConHnuing	
  from	
  Keynote	
  Talk)	
  

    Who,	
  Why,	
  What	
  
           Ne#lix	
  in	
  the	
  Cloud	
  
   Cloud	
  Challenges	
  and	
  Learnings	
  
Systems	
  and	
  OperaHons	
  Architecture	
  
                        	
  
Amazon Cloud Terminology
                                       See http://aws.amazon.com/ for details
                               This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaHon	
  code)	
  
•    EC2	
  –	
  ElasHc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraHons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosHng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Availability	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booHng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (hAp	
  access)	
  
•    EBS	
  –	
  ElasHc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDB	
  –	
  RelaHonal	
  Data	
  Base	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  hAp	
  based	
  NoSQL	
  data	
  store)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (hAp	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoHficaHon	
  Service	
  (hAp	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasHc	
  Map	
  Reduce	
  (automaHcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasHc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasHc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (extension	
  of	
  enterprise	
  datacenter	
  network	
  into	
  cloud)	
  
•    IAM	
  –	
  IdenHty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  
Ne#lix	
  Deployed	
  on	
  AWS	
  

Content	
              Logs	
              Play	
              WWW	
             API	
  
      Video	
  
                              S3	
             DRM	
             Search	
         Metadata	
  
     Masters	
  


                                                                 Movie	
  
        EC2	
          EMR	
  Hadoop	
     CDN	
  rouHng	
                      Device	
  Config	
  
                                                                Choosing	
  


                                                                                  TV	
  Movie	
  
         S3	
               Hive	
         Bookmarks	
           RaHngs	
  
                                                                                  Choosing	
  


    Content	
             Business	
                                                Mobile	
  
    Delivery	
                                Logging	
          Similars	
  
                        Intelligence	
                                              iPhone	
  
  Network	
  CDN	
  
Cloud	
  Architecture	
  
Product	
  Trade-­‐off	
  
User	
  Experience	
     ImplementaHon	
  




  Consistent	
           Development	
  
  Experience	
            complexity	
  


                          OperaHonal	
  
 Low	
  Latency	
  
                          complexity	
  
Synopsis	
  
•  The	
  Goals	
  
    –  Faster,	
  Scalable,	
  Available	
  and	
  ProducHve	
  

•  AnH-­‐paAerns	
  and	
  Cloud	
  Architecture	
  
    –  The	
  things	
  we	
  wanted	
  to	
  change	
  and	
  why	
  

•  Capacity	
  Planning	
  and	
  Monitoring	
  

•  Next	
  Steps	
  
Ne#lix	
  Cloud	
  Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenHle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verHcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasHc	
  capacity	
  effecHvely	
  
•  Available	
  
     –  SubstanHally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulHple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Hme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducHve	
  
     –  OpHmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaHon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Old	
  Datacenter	
  vs.	
  New	
  Cloud	
  Arch	
  
    Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

 SHcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

       ChaAy	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

 Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

     Instrumented	
  Code	
              Instrumented	
  Service	
  PaAerns	
  

    Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

  Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  a	
  central	
  database	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unHl	
  it	
  fails	
  
   –  Customers,	
  movies,	
  history,	
  configuraHon	
  
   	
  
•  Schema	
  changes	
  require	
  downHme	
  
                              	
  
 This	
  An(-­‐pa,ern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
                             DBA	
  
•  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downHme	
  

•  Latency	
  for	
  Memcached	
  vs.	
  Oracle	
  vs.	
  SimpleDB	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Oracle	
  for	
  simple	
  queries	
  is	
  a	
  few	
  milliseconds	
  
    –  SimpleDB	
  has	
  replicaHon	
  and	
  REST	
  overheads	
  >10ms	
  
Database	
  MigraHon	
  
•  Why	
  SimpleDB?	
  
     –  No	
  DBA’s	
  in	
  the	
  cloud,	
  Amazon	
  hosted	
  service	
  
     –  Work	
  started	
  two	
  years	
  ago,	
  fewer	
  viable	
  opHons	
  
     –  Worked	
  with	
  Amazon	
  to	
  speed	
  up	
  and	
  scale	
  SimpleDB	
  

•  AlternaHves?	
  
     –  Now	
  rolling	
  out	
  Cassandra	
  as	
  “upgrade”	
  from	
  SimpleDB	
  
     –  Need	
  several	
  opHons	
  to	
  match	
  use	
  cases	
  well	
  

•  Detailed	
  NoSQL	
  and	
  SimpleDB	
  Advice	
  
     –  Sid	
  Anand	
  	
  -­‐	
  QConSF	
  Nov	
  5th	
  –	
  Ne#lix’	
  TransiHon	
  to	
  High	
  Availability	
  
        Storage	
  Systems	
  
     –  Blog	
  -­‐	
  hAp://pracHcalcloudcompuHng.com/	
  
     –  Download	
  Paper	
  PDF	
  -­‐	
  hAp://bit.ly/bhOTLu	
  
Oracle	
  to	
  SimpleDB	
  
                         (See	
  Sid’s	
  paper	
  for	
  details)	
  

•  SimpleDB	
  Domains	
  
    –  De-­‐normalize	
  mulHple	
  tables	
  into	
  a	
  single	
  domain	
  
    –  Work	
  around	
  size	
  limits	
  (10GB	
  per	
  domain,	
  1KB	
  per	
  key)	
  
    –  Shard	
  data	
  across	
  domains	
  to	
  scale	
  
    –  Key	
  –	
  Use	
  distributed	
  sequence	
  generator,	
  GUID	
  or	
  natural	
  
       unique	
  key	
  such	
  as	
  customer-­‐id	
  	
  
    –  Implement	
  a	
  schema	
  validator	
  to	
  catch	
  bad	
  aAributes	
  

•  ApplicaHon	
  layer	
  support	
  
    –  Do	
  GROUP	
  BY	
  and	
  JOIN	
  operaHons	
  in	
  the	
  applicaHon	
  
    –  Compose	
  relaHons	
  in	
  the	
  applicaHon	
  layer	
  
    –  Check	
  constraints	
  on	
  read,	
  and	
  repair	
  data	
  as	
  a	
  side	
  effect	
  
    	
  
•  Do	
  without	
  triggers,	
  PL/SQL,	
  clock	
  operaHons	
  
The	
  SHcky	
  Session	
  
•  Datacenter	
  SHcky	
  Load	
  Balancing	
  
   –  Efficient	
  caching	
  for	
  low	
  latency	
  
   –  Tricky	
  session	
  handling	
  code	
  
   –  Middle	
  Her	
  load	
  balancer	
  has	
  issues	
  in	
  pracHce	
  

•  Encourages	
  concentrated	
  funcHonality	
  
   –  one	
  service	
  that	
  does	
  everything	
  
                               	
  
 This	
  An(-­‐pa,ern	
  impacts	
  produc(vity,	
  availability	
  
The	
  Shared	
  Session	
  
•  Cloud	
  Uses	
  Round-­‐Robin	
  Load	
  Balancing	
  
    –  Simple	
  request-­‐based	
  code	
  
    –  External	
  shared	
  caching	
  with	
  memcached	
  
    	
  
•  More	
  flexible	
  fine	
  grain	
  services	
  
    –  Works	
  beAer	
  with	
  auto-­‐scaled	
  instance	
  counts	
  
ChaAy	
  Opaque	
  and	
  BriAle	
  Protocols	
  
•  Datacenter	
  service	
  protocols	
  
    –  Assumed	
  low	
  latency	
  for	
  many	
  simple	
  requests	
  

•  Based	
  on	
  serializing	
  exisHng	
  java	
  objects	
  
    –  Inefficient	
  formats	
  
    –  IncompaHble	
  when	
  definiHons	
  change	
  
                              	
  
  This	
  An(-­‐pa,ern	
  causes	
  produc(vity,	
  latency	
  
                  and	
  availability	
  issues	
  
Robust	
  and	
  Flexible	
  Protocols	
  
•  Cloud	
  service	
  protocols	
  
    –  JSR311/Jersey	
  is	
  used	
  for	
  REST/HTTP	
  service	
  calls	
  
    –  Custom	
  client	
  code	
  includes	
  service	
  discovery	
  
    –  Support	
  complex	
  data	
  types	
  in	
  a	
  single	
  request	
  

•  Apache	
  Avro	
  
    –  Evolved	
  from	
  Protocol	
  Buffers	
  and	
  Thri>	
  
    –  Includes	
  JSON	
  header	
  defining	
  key/value	
  protocol	
  
    –  Avro	
  serializaHon	
  is	
  half	
  the	
  size	
  and	
  several	
  Hmes	
  
       faster	
  than	
  Java	
  serializaHon,	
  more	
  work	
  to	
  code	
  
Persisted	
  Protocols	
  
•  Persist	
  Avro	
  in	
  Memcached	
  
   –  Save	
  space/latency	
  (zigzag	
  encoding,	
  half	
  the	
  size)	
  
   –  Less	
  briAle	
  across	
  versions	
  
   –  New	
  keys	
  are	
  ignored	
  
   –  Missing	
  keys	
  are	
  handled	
  cleanly	
  

•  Avro	
  protocol	
  definiHons	
  
   –  Can	
  be	
  wriAen	
  in	
  JSON	
  or	
  generated	
  from	
  POJOs	
  
   –  It’s	
  hard,	
  needs	
  beAer	
  tooling	
  
Tangled	
  Service	
  Interfaces	
  
•  Datacenter	
  implementaHon	
  is	
  exposed	
  
    –  Oracle	
  SQL	
  queries	
  mixed	
  into	
  business	
  logic	
  

•  Tangled	
  code	
  
    –  Deep	
  dependencies,	
  false	
  sharing	
  

•  Data	
  providers	
  with	
  sideways	
  dependencies	
  
    –  Everything	
  depends	
  on	
  everything	
  else	
  

  This	
  An(-­‐pa,ern	
  affects	
  produc(vity,	
  availability	
  
Untangled	
  Service	
  Interfaces	
  
•  New	
  Cloud	
  Code	
  With	
  Strict	
  Layering	
  
    –  Compile	
  against	
  interface	
  jar	
  
    –  Can	
  use	
  spring	
  runHme	
  binding	
  to	
  enforce	
  
    	
  
•  Service	
  interface	
  is	
  the	
  service	
  
    –  ImplementaHon	
  is	
  completely	
  hidden	
  
    –  Can	
  be	
  implemented	
  locally	
  or	
  remotely	
  
    –  ImplementaHon	
  can	
  evolve	
  independently	
  
Untangled	
  Service	
  Interfaces	
  
Two	
  layers:	
  
•  SAL	
  -­‐	
  Service	
  Access	
  Library	
  
    –  Basic	
  serializaHon	
  and	
  error	
  handling	
  
    –  REST	
  or	
  POJO’s	
  defined	
  by	
  data	
  provider	
  

•  ESL	
  -­‐	
  Extended	
  Service	
  Library	
  
    –  Caching,	
  conveniences	
  
    –  Can	
  combine	
  several	
  SALs	
  
    –  Exposes	
  faceted	
  type	
  system	
  (described	
  later)	
  
    –  Interface	
  defined	
  by	
  data	
  consumer	
  in	
  many	
  cases	
  
Service	
  InteracHon	
  PaAern	
  
    Sample	
  Swimlane	
  Diagram	
  
Service	
  Architecture	
  PaAerns	
  
•  Internal	
  Interfaces	
  Between	
  Services	
  
   –  Common	
  paAerns	
  as	
  templates	
  
   –  Highly	
  instrumented,	
  observable,	
  analyHcs	
  
   –  Service	
  Level	
  Agreements	
  –	
  SLAs	
  
   	
  
•  Library	
  templates	
  for	
  generic	
  features	
  
   –  Instrumented	
  Ne#lix	
  Base	
  Servlet	
  template	
  
   –  Instrumented	
  generic	
  client	
  interface	
  template	
  
   –  Instrumented	
  S3,	
  SimpleDB,	
  Memcached	
  clients	
  
CLIENT	
  
                                                                  Request	
  Start	
  
                                                                   Timestamp,	
               Client	
  
                                          Inbound	
               Request	
  End	
          outbound	
  
                                       deserialize	
  end	
        Timestamp	
            serialize	
  start	
  
                                         Hmestamp	
  
                                                                                           Hmestamp	
  

                  Inbound	
                                                                                            Client	
  
                 deserialize	
                                                                                      outbound	
  
                    start	
                                                                                        serialize	
  end	
  
                 Hmestamp	
                                                                                         Hmestamp	
  




Client	
  network	
  
    receive	
  
  Hmestamp	
  
                                       Service	
  Request	
                                                                       Client	
  Network	
  
                                                                                                                                       send	
  
                                                                                                                                    Hmestamp	
  



                                      Instruments	
  Every	
  
   Service	
  
network	
  send	
  
 Hmestamp	
  
                                        Step	
  in	
  the	
  call	
                                                                   Service	
  
                                                                                                                                      Network	
  
                                                                                                                                      receive	
  
                                                                                                                                     Hmestamp	
  




                  Service	
                                                                                           Service	
  
                outbound	
                                                                                           inbound	
  
               serialize	
  end	
                                                                                  serialize	
  start	
  
                Hmestamp	
                                                                                          Hmestamp	
  

                                           Service	
                                         Service	
  
                                          outbound	
                                        inbound	
  
                                        serialize	
  start	
     SERVICE	
  execute	
     serialize	
  end	
  
                                                                   request	
  start	
  
                                         Hmestamp	
                                        Hmestamp	
  
                                                                    Hmestamp,	
  
                                                                 execute	
  request	
  
                                                                  end	
  Hmestamp	
  
Boundary	
  Interfaces	
  
•  Isolate	
  teams	
  from	
  external	
  dependencies	
  
   –  Fake	
  SAL	
  built	
  by	
  cloud	
  team	
  
   –  Real	
  SAL	
  provided	
  by	
  data	
  provider	
  team	
  later	
  
   –  ESL	
  built	
  by	
  cloud	
  team	
  using	
  faceted	
  objects	
  

•  Fake	
  data	
  sources	
  allow	
  development	
  to	
  start	
  
   –  e.g.	
  Fake	
  IdenHty	
  SAL	
  for	
  a	
  test	
  set	
  of	
  customers	
  
   –  Development	
  solidifies	
  dependencies	
  early	
  
   –  Helps	
  external	
  team	
  provide	
  the	
  right	
  interface	
  
One	
  Object	
  That	
  Does	
  Everything	
  
•  Datacenter	
  uses	
  a	
  few	
  big	
  complex	
  objects	
  
    –  Movie	
  and	
  Customer	
  objects	
  are	
  the	
  foundaHon	
  
    –  Good	
  choice	
  for	
  a	
  small	
  team	
  and	
  one	
  instance	
  
    –  ProblemaHc	
  for	
  large	
  teams	
  and	
  many	
  instances	
  

•  False	
  sharing	
  causes	
  tangled	
  dependencies	
  
    –  UnproducHve	
  re-­‐integraHon	
  work	
  
                             	
  
An(-­‐pa,ern	
  impac(ng	
  produc(vity	
  and	
  availability	
  
An	
  Interface	
  For	
  Each	
  Component	
  
•  Cloud	
  uses	
  faceted	
  Video	
  and	
  Visitor	
  
    –  Basic	
  types	
  hold	
  only	
  the	
  idenHfier	
  
    –  Facets	
  scope	
  the	
  interface	
  you	
  actually	
  need	
  
    –  Each	
  component	
  can	
  define	
  its	
  own	
  facets	
  

•  No	
  false-­‐sharing	
  and	
  dependency	
  chains	
  
    –  Type	
  manager	
  converts	
  between	
  facets	
  as	
  needed	
  
    –  video.asA(PresentaHonVideo)	
  for	
  www	
  
    –  video.asA(MerchableVideo)	
  for	
  middle	
  Her	
  
So>ware	
  Architecture	
  PaAerns	
  
•  Object	
  Models	
  
   –  Basic	
  and	
  derived	
  types,	
  facets,	
  serializable	
  
   –  Pass	
  by	
  reference	
  within	
  a	
  service	
  
   –  Pass	
  by	
  value	
  between	
  services	
  
   	
  
•  ComputaHon	
  and	
  I/O	
  Models	
  
   –  Service	
  ExecuHon	
  using	
  Best	
  Effort	
  
   –  Common	
  thread	
  pool	
  management	
  
Cloud	
  OperaHons	
  

  Model	
  Driven	
  Architecture	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Tools	
  and	
  AutomaHon	
  
•  Developer	
  and	
  Build	
  Tools	
  
      –  Jira,	
  Eclipse,	
  Jeeves,	
  Ivy,	
  ArHfactory	
  
      –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  

•  Custom	
  Ne#lix	
  ApplicaHon	
  Console	
  
      –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
      –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producHon	
  

•  Open	
  Source	
  +	
  Support	
  
      –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop,	
  OpenJDK/SunJDK,	
  CentOS/AmazonLinux	
  
      	
  
•  Monitoring	
  Tools	
  
      –    Keynote	
  –	
  service	
  monitoring	
  and	
  alerHng	
  
      –    AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  hAp://appdynamics.com	
  
      –    EpicNMS	
  –	
  flexible	
  data	
  collecHon	
  and	
  plots	
  hAp://epicnms.com	
  
      –    Nimso>	
  NMS	
  –	
  ITOps	
  focus	
  for	
  Datacenter	
  +	
  Cloud	
  alerHng	
  
Model	
  Driven	
  Architecture	
  
•  Datacenter	
  PracHces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  paAerns	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jeeves	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producHon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaHon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

            No	
  excep(ons,	
  every	
  change	
  is	
  a	
  new	
  AMI	
  
Model	
  Driven	
  ImplicaHons	
  
•  Automated	
  “Least	
  Privilege”	
  Security	
  
   –  Tightly	
  specified	
  security	
  groups	
  
   –  Fine	
  grain	
  IAM	
  keys	
  to	
  access	
  AWS	
  resources	
  
   –  Performance	
  tools	
  security	
  and	
  integraHon	
  


•  Model	
  Driven	
  Performance	
  Monitoring	
  
   –  Hundreds	
  of	
  instances	
  appear	
  in	
  a	
  few	
  minutes…	
  
   –  Tools	
  have	
  to	
  “garbage	
  collect”	
  dead	
  instances	
  	
  
Ne#lix	
  App	
  Console	
  
Auto	
  Scale	
  Group	
  ConfiguraHon	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Capacity	
  Planning	
  in	
  Clouds	
  
                     (a	
  few	
  things	
  have	
  changed…)	
  

•    Capacity	
  is	
  expensive	
  
•    Capacity	
  takes	
  Hme	
  to	
  buy	
  and	
  provision	
  
•    Capacity	
  only	
  increases,	
  can’t	
  be	
  shrunk	
  easily	
  
•    Capacity	
  comes	
  in	
  big	
  chunks,	
  paid	
  up	
  front	
  
•    Planning	
  errors	
  can	
  cause	
  big	
  problems	
  
•    Systems	
  are	
  clearly	
  defined	
  assets	
  
•    Systems	
  can	
  be	
  instrumented	
  in	
  detail	
  
•    Depreciate	
  assets	
  over	
  3	
  years	
  (reservaHons!)	
  
Monitoring	
  Issues	
  
•  Problem	
  
   –  Too	
  many	
  tools,	
  each	
  with	
  a	
  good	
  reason	
  to	
  exist	
  
   –  Hard	
  to	
  get	
  an	
  integrated	
  view	
  of	
  a	
  problem	
  
   –  Too	
  much	
  manual	
  work	
  building	
  dashboards	
  
   –  Tools	
  are	
  not	
  discoverable,	
  views	
  are	
  not	
  filtered	
  

•  SoluHon	
  
   –  Get	
  vendors	
  to	
  add	
  deep	
  linking	
  URLs	
  and	
  APIs	
  
   –  IntegraHon	
  “portal”	
  Hes	
  everything	
  together	
  
   –  Underlying	
  dependency	
  database	
  
   –  Dynamic	
  portal	
  generaHon,	
  relevant	
  data,	
  all	
  tools	
  
Data	
  Sources	
  
                                      • External	
  URL	
  availability	
  and	
  latency	
  alerts	
  and	
  reports	
  –	
  Keynote	
  
     External	
  TesHng	
             • Stress	
  tesHng	
  -­‐	
  SOASTA	
  

                                      • Ne#lix	
  REST	
  calls	
  –	
  Chukwa	
  to	
  DataOven	
  with	
  GUID	
  transacHon	
  idenHfier	
  
 Request	
  Trace	
  Logging	
        • Generic	
  HTTP	
  –	
  AppDynamics	
  service	
  Her	
  aggregaHon,	
  end	
  to	
  end	
  tracking	
  

                                      • Tracers	
  and	
  counters	
  –	
  log4j,	
  tracer	
  central,	
  Chukwa	
  to	
  DataOven	
  
   ApplicaHon	
  logging	
            • Trackid	
  and	
  Audit/Debug	
  logging	
  –	
  DataOven,	
  Appdynamics	
  	
  GUID	
  cross	
  reference	
  

                                      • ApplicaHon	
  specific	
  real	
  Hme	
  –	
  Nimso>,	
  Appdynamics,	
  Epic	
  
        JMX	
  	
  Metrics	
          • Service	
  and	
  SLA	
  percenHles	
  –	
  Nimso>,	
  Appdynamics,	
  Epic,logged	
  to	
  DataOven	
  

                                      • Stdout	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso>	
  alerHng	
  
Tomcat	
  and	
  Apache	
  logs	
     • Standard	
  format	
  Access	
  and	
  Error	
  logs	
  –	
  S3	
  –	
  DataOven,	
  Nimso>	
  AlerHng	
  

                                      • Garbage	
  CollecHon	
  –	
  Nimso>,	
  Appdynamics	
  
               JVM	
                  • Memory	
  usage,	
  call	
  stacks,	
  resource/call	
  -­‐	
  AppDynamics	
  

                                      • system	
  CPU/Net/RAM/Disk	
  metrics	
  –	
  AppDynamics,	
  Epic,	
  Nimso>	
  AlerHng	
  
              Linux	
                 • SNMP	
  metrics	
  –	
  Epic,	
  Network	
  flows	
  -­‐	
  FasHp	
  

                                      • Load	
  balancer	
  traffic	
  –	
  Amazon	
  Cloudwatch,	
  SimpleDB	
  usage	
  stats	
  
              AWS	
                   • System	
  configuraHon	
  	
  -­‐	
  CPU	
  count/speed	
  and	
  RAM	
  size,	
  overall	
  usage	
  -­‐	
  AWS	
  
Integrated	
  Dashboards	
  
Dashboards	
  Architecture	
  
•  Integrated	
  Dashboard	
  View	
  
    –  Single	
  web	
  page	
  containing	
  content	
  from	
  many	
  tools	
  
    –  Filtered	
  to	
  highlight	
  most	
  “interesHng”	
  data	
  

•  Relevance	
  Controller	
  
    –  Drill	
  in,	
  add	
  and	
  remove	
  content	
  interacHvely	
  
    –  Given	
  an	
  applicaHon,	
  alert	
  or	
  problem	
  area,	
  dynamically	
  
       build	
  a	
  dashboard	
  relevant	
  to	
  your	
  role	
  and	
  needs	
  

•  Dependency	
  and	
  Incident	
  Model	
  
    –  Model	
  Driven	
  -­‐	
  Interrogates	
  tools	
  and	
  AWS	
  APIs	
  
    –  Document	
  store	
  to	
  capture	
  dependency	
  tree	
  and	
  states	
  
Dashboard	
  Prototype	
  
  (not	
  everything	
  is	
  integrated	
  yet)	
  
AppDynamics	
  
        How	
  to	
  look	
  deep	
  inside	
  your	
  cloud	
  applicaHons	
  

•  AutomaHc	
  Monitoring	
  
   –  Base	
  AMI	
  includes	
  all	
  monitoring	
  tools	
  
   –  Outbound	
  calls	
  only	
  –	
  no	
  discovery/polling	
  issues	
  
   –  InacHve	
  instances	
  removed	
  a>er	
  a	
  few	
  days	
  
   	
  
•  Incident	
  Alarms	
  (deviaHon	
  from	
  baseline)	
  
   –  Business	
  TransacHon	
  latency	
  and	
  error	
  rate	
  
   –  Alarm	
  thresholds	
  discover	
  their	
  own	
  baseline	
  
   –  Email	
  contains	
  URL	
  to	
  Incident	
  Workbench	
  UI	
  
Using	
  AppDynamics	
  
(simple	
  example	
  from	
  early	
  2010)	
  
Assess	
  Impact	
  using	
  AppDynamics	
  
 View	
  actual	
  call	
  graph	
  on	
  producHon	
  systems	
  
Monitoring	
  Summary	
  
•  Broken	
  datacenter	
  oriented	
  tools	
  is	
  a	
  big	
  problem	
  

•  IntegraHng	
  many	
  different	
  tools	
  
     –  They	
  are	
  not	
  designed	
  to	
  be	
  integrated	
  
     –  We	
  have	
  “persuaded”	
  vendors	
  to	
  add	
  APIs	
  


•  If	
  you	
  can’t	
  see	
  deep	
  inside	
  your	
  app,	
  you’re	
  L	
  
Wrap	
  Up	
  
Next	
  Few	
  Years…	
  
•  “System	
  of	
  Record”	
  moves	
  to	
  Cloud	
  (now)	
  
      –  Master	
  copies	
  of	
  data	
  live	
  only	
  in	
  the	
  cloud,	
  with	
  backups	
  
      –  Cut	
  the	
  datacenter	
  to	
  cloud	
  replicaHon	
  link	
  

•  InternaHonal	
  Expansion	
  –	
  Global	
  Clouds	
  (later	
  in	
  2011)	
  
      –  Rapid	
  deployments	
  to	
  new	
  markets	
  

•  Cloud	
  StandardizaHon?	
  
      –      Cloud	
  features	
  and	
  APIs	
  should	
  be	
  a	
  commodity	
  not	
  a	
  differenHator	
  
      –      DifferenHate	
  on	
  scale	
  and	
  quality	
  of	
  service	
  
      –      CompeHHon	
  also	
  drives	
  cost	
  down	
  
      –      Higher	
  resilience	
  and	
  scalability	
  

      	
  
      We	
  would	
  prefer	
  to	
  be	
  an	
  insignificant	
  customer	
  in	
  a	
  giant	
  cloud	
  
Takeaway	
  
                                	
  
NeAlix	
  is	
  path-­‐finding	
  the	
  use	
  of	
  public	
  AWS	
  
 cloud	
  to	
  replace	
  in-­‐house	
  IT	
  for	
  non-­‐trivial	
  
applica(ons	
  with	
  hundreds	
  of	
  developers	
  and	
  
                  thousands	
  of	
  systems.	
  
                                	
  
                    acockcro>@ne#lix.com	
  
            hAp://www.linkedin.com/in/adriancockcro>	
  
                    @adrianco	
  #ne#lixcloud	
  
杭州站 · 2011年10月20日~22日
 www.qconhangzhou.com(6月启动)




QCon北京站官方网站和资料下载
     www.qconbeijing.com

Mais conteúdo relacionado

Mais procurados

A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudDeepak Rao
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practicesOmid Vahdaty
 
NephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale
 
AWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAmazon Web Services
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapBarry Jones
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content ProductionAmazon Web Services
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
RunE2E Case Study: SAP BusinessObjects in the AWS Cloud
RunE2E Case Study: SAP BusinessObjects in the AWS CloudRunE2E Case Study: SAP BusinessObjects in the AWS Cloud
RunE2E Case Study: SAP BusinessObjects in the AWS CloudAlex Gramling
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...IndicThreads
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloudtdwiindia
 
Scalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetScalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetBala Subra
 
Azure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverAzure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverGary Jackson MBCS
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
 

Mais procurados (20)

A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon Cloud
 
Cloud Architecture best practices
Cloud Architecture best practicesCloud Architecture best practices
Cloud Architecture best practices
 
NephoScale Elastic Networking
NephoScale Elastic NetworkingNephoScale Elastic Networking
NephoScale Elastic Networking
 
AWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS CloudAWS Webcast - Library Systems on the AWS Cloud
AWS Webcast - Library Systems on the AWS Cloud
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content Production
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
RunE2E Case Study: SAP BusinessObjects in the AWS Cloud
RunE2E Case Study: SAP BusinessObjects in the AWS CloudRunE2E Case Study: SAP BusinessObjects in the AWS Cloud
RunE2E Case Study: SAP BusinessObjects in the AWS Cloud
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloud
 
Scalable Resilient Web Services In .Net
Scalable Resilient Web Services In .NetScalable Resilient Web Services In .Net
Scalable Resilient Web Services In .Net
 
Azure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverAzure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaver
 
Masterclass Webinar: Amazon S3
Masterclass Webinar: Amazon S3Masterclass Webinar: Amazon S3
Masterclass Webinar: Amazon S3
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 

Destaque

Using reader responses to make revisions
Using reader responses to make revisionsUsing reader responses to make revisions
Using reader responses to make revisionswritRHET -
 
Cacti安装手册
Cacti安装手册Cacti安装手册
Cacti安装手册Yiwei Ma
 
BuilHigh Performance Weibo Platform-Qcon2011
BuilHigh Performance Weibo Platform-Qcon2011BuilHigh Performance Weibo Platform-Qcon2011
BuilHigh Performance Weibo Platform-Qcon2011Yiwei Ma
 
Zhongxing practice-suchunshan-qcon
Zhongxing practice-suchunshan-qconZhongxing practice-suchunshan-qcon
Zhongxing practice-suchunshan-qconYiwei Ma
 
Услуги в области инноваций региона Кюменлааксо (Финляндия)
Услуги в области инноваций региона Кюменлааксо (Финляндия)Услуги в области инноваций региона Кюменлааксо (Финляндия)
Услуги в области инноваций региона Кюменлааксо (Финляндия)St. Petersburg Foundation for SME Development
 
Nginx 0.8.x 安装手册
Nginx 0.8.x 安装手册Nginx 0.8.x 安装手册
Nginx 0.8.x 安装手册Yiwei Ma
 
Puppet安装测试
Puppet安装测试Puppet安装测试
Puppet安装测试Yiwei Ma
 
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"St. Petersburg Foundation for SME Development
 

Destaque (8)

Using reader responses to make revisions
Using reader responses to make revisionsUsing reader responses to make revisions
Using reader responses to make revisions
 
Cacti安装手册
Cacti安装手册Cacti安装手册
Cacti安装手册
 
BuilHigh Performance Weibo Platform-Qcon2011
BuilHigh Performance Weibo Platform-Qcon2011BuilHigh Performance Weibo Platform-Qcon2011
BuilHigh Performance Weibo Platform-Qcon2011
 
Zhongxing practice-suchunshan-qcon
Zhongxing practice-suchunshan-qconZhongxing practice-suchunshan-qcon
Zhongxing practice-suchunshan-qcon
 
Услуги в области инноваций региона Кюменлааксо (Финляндия)
Услуги в области инноваций региона Кюменлааксо (Финляндия)Услуги в области инноваций региона Кюменлааксо (Финляндия)
Услуги в области инноваций региона Кюменлааксо (Финляндия)
 
Nginx 0.8.x 安装手册
Nginx 0.8.x 安装手册Nginx 0.8.x 安装手册
Nginx 0.8.x 安装手册
 
Puppet安装测试
Puppet安装测试Puppet安装测试
Puppet安装测试
 
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"
Виртуальная ИТ-инфраструктура предприятия на базе свободного ПО "под ключ"
 

Semelhante a Netflix web-adrian-qcon

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
O'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Media
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureAmazon Web Services
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Tổng quan về AWS cực hay
Tổng quan về AWS cực hayTổng quan về AWS cực hay
Tổng quan về AWS cực hayHoa PN Thaycacac
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Amazon Ec2 Application Design
Amazon Ec2 Application DesignAmazon Ec2 Application Design
Amazon Ec2 Application Designguestd0b61e
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Amazon Web Services
 
Neev cloud services with AWS
Neev cloud services with AWSNeev cloud services with AWS
Neev cloud services with AWSNeev Technologies
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAmazon Web Services
 
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)Amazon Web Services
 
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법Amazon Web Services Korea
 

Semelhante a Netflix web-adrian-qcon (20)

Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
O'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Webcast: Architecting Applications For The Cloud
O'Reilly Webcast: Architecting Applications For The Cloud
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Ceate a Scalable Cloud Architecture
Ceate a Scalable Cloud ArchitectureCeate a Scalable Cloud Architecture
Ceate a Scalable Cloud Architecture
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Tổng quan về AWS cực hay
Tổng quan về AWS cực hayTổng quan về AWS cực hay
Tổng quan về AWS cực hay
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Amazon Ec2 Application Design
Amazon Ec2 Application DesignAmazon Ec2 Application Design
Amazon Ec2 Application Design
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20
 
Windows Azure introduction
Windows Azure introductionWindows Azure introduction
Windows Azure introduction
 
Neev cloud services with AWS
Neev cloud services with AWSNeev cloud services with AWS
Neev cloud services with AWS
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloud
 
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)
 
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
갑작스러운 유저의 수요 증가에 현명하게 대처하는 방법
 

Mais de Yiwei Ma

Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconYiwei Ma
 
Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconYiwei Ma
 
Taobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconTaobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconYiwei Ma
 
Alibaba server-zhangxuseng-qcon
Alibaba server-zhangxuseng-qconAlibaba server-zhangxuseng-qcon
Alibaba server-zhangxuseng-qconYiwei Ma
 
Taobao practice-liyu-qcon
Taobao practice-liyu-qconTaobao practice-liyu-qcon
Taobao practice-liyu-qconYiwei Ma
 
Thoughtworks practice-hukai-qcon
Thoughtworks practice-hukai-qconThoughtworks practice-hukai-qcon
Thoughtworks practice-hukai-qconYiwei Ma
 
Ufida design-chijianqiang-qcon
Ufida design-chijianqiang-qconUfida design-chijianqiang-qcon
Ufida design-chijianqiang-qconYiwei Ma
 
Spring design-juergen-qcon
Spring design-juergen-qconSpring design-juergen-qcon
Spring design-juergen-qconYiwei Ma
 
Google arch-fangkun-qcon
Google arch-fangkun-qconGoogle arch-fangkun-qcon
Google arch-fangkun-qconYiwei Ma
 
Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconYiwei Ma
 
Alibaba arch-jiangtao-qcon
Alibaba arch-jiangtao-qconAlibaba arch-jiangtao-qcon
Alibaba arch-jiangtao-qconYiwei Ma
 
Twitter keynote-evan-qcon
Twitter keynote-evan-qconTwitter keynote-evan-qcon
Twitter keynote-evan-qconYiwei Ma
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Domainlang keynote-eric-qcon
Domainlang keynote-eric-qconDomainlang keynote-eric-qcon
Domainlang keynote-eric-qconYiwei Ma
 
Devjam keynote-david-qcon
Devjam keynote-david-qconDevjam keynote-david-qcon
Devjam keynote-david-qconYiwei Ma
 
Baidu keynote-wubo-qcon
Baidu keynote-wubo-qconBaidu keynote-wubo-qcon
Baidu keynote-wubo-qconYiwei Ma
 
淘宝线上线下性能跟踪体系和容量规划-Qcon2011
淘宝线上线下性能跟踪体系和容量规划-Qcon2011淘宝线上线下性能跟踪体系和容量规划-Qcon2011
淘宝线上线下性能跟踪体系和容量规划-Qcon2011Yiwei Ma
 
网游服务器性能优化-Qcon2011
网游服务器性能优化-Qcon2011网游服务器性能优化-Qcon2011
网游服务器性能优化-Qcon2011Yiwei Ma
 
Xietingbao-Qcon2011
Xietingbao-Qcon2011Xietingbao-Qcon2011
Xietingbao-Qcon2011Yiwei Ma
 

Mais de Yiwei Ma (20)

Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qcon
 
Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qcon
 
Taobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconTaobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qcon
 
Alibaba server-zhangxuseng-qcon
Alibaba server-zhangxuseng-qconAlibaba server-zhangxuseng-qcon
Alibaba server-zhangxuseng-qcon
 
Taobao practice-liyu-qcon
Taobao practice-liyu-qconTaobao practice-liyu-qcon
Taobao practice-liyu-qcon
 
Thoughtworks practice-hukai-qcon
Thoughtworks practice-hukai-qconThoughtworks practice-hukai-qcon
Thoughtworks practice-hukai-qcon
 
Ufida design-chijianqiang-qcon
Ufida design-chijianqiang-qconUfida design-chijianqiang-qcon
Ufida design-chijianqiang-qcon
 
Spring design-juergen-qcon
Spring design-juergen-qconSpring design-juergen-qcon
Spring design-juergen-qcon
 
Google arch-fangkun-qcon
Google arch-fangkun-qconGoogle arch-fangkun-qcon
Google arch-fangkun-qcon
 
Cibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qconCibank arch-zhouweiran-qcon
Cibank arch-zhouweiran-qcon
 
Alibaba arch-jiangtao-qcon
Alibaba arch-jiangtao-qconAlibaba arch-jiangtao-qcon
Alibaba arch-jiangtao-qcon
 
Twitter keynote-evan-qcon
Twitter keynote-evan-qconTwitter keynote-evan-qcon
Twitter keynote-evan-qcon
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Domainlang keynote-eric-qcon
Domainlang keynote-eric-qconDomainlang keynote-eric-qcon
Domainlang keynote-eric-qcon
 
Devjam keynote-david-qcon
Devjam keynote-david-qconDevjam keynote-david-qcon
Devjam keynote-david-qcon
 
Baidu keynote-wubo-qcon
Baidu keynote-wubo-qconBaidu keynote-wubo-qcon
Baidu keynote-wubo-qcon
 
淘宝线上线下性能跟踪体系和容量规划-Qcon2011
淘宝线上线下性能跟踪体系和容量规划-Qcon2011淘宝线上线下性能跟踪体系和容量规划-Qcon2011
淘宝线上线下性能跟踪体系和容量规划-Qcon2011
 
网游服务器性能优化-Qcon2011
网游服务器性能优化-Qcon2011网游服务器性能优化-Qcon2011
网游服务器性能优化-Qcon2011
 
Xietingbao-Qcon2011
Xietingbao-Qcon2011Xietingbao-Qcon2011
Xietingbao-Qcon2011
 

Netflix web-adrian-qcon

  • 1. Ne#lix  Cloud  Architecture   Qcon  Beijing  April  9,  2011   Adrian  Cockcro>   @adrianco  #ne#lixcloud  hAp://slideshare.net/adrianco   acockcro>@ne#lix.com  
  • 2. (ConHnuing  from  Keynote  Talk)   Who,  Why,  What   Ne#lix  in  the  Cloud   Cloud  Challenges  and  Learnings   Systems  and  OperaHons  Architecture    
  • 3. Amazon Cloud Terminology See http://aws.amazon.com/ for details This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaHon  code)   •  EC2  –  ElasHc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraHons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosHng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan   •  ASG  –  Auto  Scaling  Group  (instances  booHng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (hAp  access)   •  EBS  –  ElasHc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDB  –  RelaHonal  Data  Base  (managed  MySQL  master  and  slaves)   •  SDB  –  Simple  Data  Base  (hosted  hAp  based  NoSQL  data  store)   •  SQS  –  Simple  Queue  Service  (hAp  based  message  queue)   •  SNS  –  Simple  NoHficaHon  Service  (hAp  and  email  based  topics  and  messages)   •  EMR  –  ElasHc  Map  Reduce  (automaHcally  managed  Hadoop  cluster)   •  ELB  –  ElasHc  Load  Balancer   •  EIP  –  ElasHc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)   •  IAM  –  IdenHty  and  Access  Management  (fine  grain  role  based  security  keys)  
  • 4. Ne#lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Search   Metadata   Masters   Movie   EC2   EMR  Hadoop   CDN  rouHng   Device  Config   Choosing   TV  Movie   S3   Hive   Bookmarks   RaHngs   Choosing   Content   Business   Mobile   Delivery   Logging   Similars   Intelligence   iPhone   Network  CDN  
  • 6. Product  Trade-­‐off   User  Experience   ImplementaHon   Consistent   Development   Experience   complexity   OperaHonal   Low  Latency   complexity  
  • 7. Synopsis   •  The  Goals   –  Faster,  Scalable,  Available  and  ProducHve   •  AnH-­‐paAerns  and  Cloud  Architecture   –  The  things  we  wanted  to  change  and  why   •  Capacity  Planning  and  Monitoring   •  Next  Steps  
  • 8. Ne#lix  Cloud  Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenHle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verHcally  scaled  databases   –  Leverage  AWS  elasHc  capacity  effecHvely   •  Available   –  SubstanHally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulHple  AWS  availability  zones   –  No  scheduled  down  Hme,  no  central  database  schema  to  change   •  ProducHve   –  OpHmize  agility  of  a  large  development  team  with  automaHon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 9. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SHcky  In-­‐Memory  Session   Shared  Memcached  Session   ChaAy  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  PaAerns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 10. The  Central  SQL  Database   •  Datacenter  has  a  central  database   –  Everything  in  one  place  is  convenient  unHl  it  fails   –  Customers,  movies,  history,  configuraHon     •  Schema  changes  require  downHme     This  An(-­‐pa,ern  impacts  scalability,  availability  
  • 11. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   –  Joins  take  place  in  java  code   DBA   •  No  schema  to  change,  no  scheduled  downHme   •  Latency  for  Memcached  vs.  Oracle  vs.  SimpleDB   –  Memcached  is  dominated  by  network  latency  <1ms   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  SimpleDB  has  replicaHon  and  REST  overheads  >10ms  
  • 12. Database  MigraHon   •  Why  SimpleDB?   –  No  DBA’s  in  the  cloud,  Amazon  hosted  service   –  Work  started  two  years  ago,  fewer  viable  opHons   –  Worked  with  Amazon  to  speed  up  and  scale  SimpleDB   •  AlternaHves?   –  Now  rolling  out  Cassandra  as  “upgrade”  from  SimpleDB   –  Need  several  opHons  to  match  use  cases  well   •  Detailed  NoSQL  and  SimpleDB  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne#lix’  TransiHon  to  High  Availability   Storage  Systems   –  Blog  -­‐  hAp://pracHcalcloudcompuHng.com/   –  Download  Paper  PDF  -­‐  hAp://bit.ly/bhOTLu  
  • 13. Oracle  to  SimpleDB   (See  Sid’s  paper  for  details)   •  SimpleDB  Domains   –  De-­‐normalize  mulHple  tables  into  a  single  domain   –  Work  around  size  limits  (10GB  per  domain,  1KB  per  key)   –  Shard  data  across  domains  to  scale   –  Key  –  Use  distributed  sequence  generator,  GUID  or  natural   unique  key  such  as  customer-­‐id     –  Implement  a  schema  validator  to  catch  bad  aAributes   •  ApplicaHon  layer  support   –  Do  GROUP  BY  and  JOIN  operaHons  in  the  applicaHon   –  Compose  relaHons  in  the  applicaHon  layer   –  Check  constraints  on  read,  and  repair  data  as  a  side  effect     •  Do  without  triggers,  PL/SQL,  clock  operaHons  
  • 14. The  SHcky  Session   •  Datacenter  SHcky  Load  Balancing   –  Efficient  caching  for  low  latency   –  Tricky  session  handling  code   –  Middle  Her  load  balancer  has  issues  in  pracHce   •  Encourages  concentrated  funcHonality   –  one  service  that  does  everything     This  An(-­‐pa,ern  impacts  produc(vity,  availability  
  • 15. The  Shared  Session   •  Cloud  Uses  Round-­‐Robin  Load  Balancing   –  Simple  request-­‐based  code   –  External  shared  caching  with  memcached     •  More  flexible  fine  grain  services   –  Works  beAer  with  auto-­‐scaled  instance  counts  
  • 16. ChaAy  Opaque  and  BriAle  Protocols   •  Datacenter  service  protocols   –  Assumed  low  latency  for  many  simple  requests   •  Based  on  serializing  exisHng  java  objects   –  Inefficient  formats   –  IncompaHble  when  definiHons  change     This  An(-­‐pa,ern  causes  produc(vity,  latency   and  availability  issues  
  • 17. Robust  and  Flexible  Protocols   •  Cloud  service  protocols   –  JSR311/Jersey  is  used  for  REST/HTTP  service  calls   –  Custom  client  code  includes  service  discovery   –  Support  complex  data  types  in  a  single  request   •  Apache  Avro   –  Evolved  from  Protocol  Buffers  and  Thri>   –  Includes  JSON  header  defining  key/value  protocol   –  Avro  serializaHon  is  half  the  size  and  several  Hmes   faster  than  Java  serializaHon,  more  work  to  code  
  • 18. Persisted  Protocols   •  Persist  Avro  in  Memcached   –  Save  space/latency  (zigzag  encoding,  half  the  size)   –  Less  briAle  across  versions   –  New  keys  are  ignored   –  Missing  keys  are  handled  cleanly   •  Avro  protocol  definiHons   –  Can  be  wriAen  in  JSON  or  generated  from  POJOs   –  It’s  hard,  needs  beAer  tooling  
  • 19. Tangled  Service  Interfaces   •  Datacenter  implementaHon  is  exposed   –  Oracle  SQL  queries  mixed  into  business  logic   •  Tangled  code   –  Deep  dependencies,  false  sharing   •  Data  providers  with  sideways  dependencies   –  Everything  depends  on  everything  else   This  An(-­‐pa,ern  affects  produc(vity,  availability  
  • 20. Untangled  Service  Interfaces   •  New  Cloud  Code  With  Strict  Layering   –  Compile  against  interface  jar   –  Can  use  spring  runHme  binding  to  enforce     •  Service  interface  is  the  service   –  ImplementaHon  is  completely  hidden   –  Can  be  implemented  locally  or  remotely   –  ImplementaHon  can  evolve  independently  
  • 21. Untangled  Service  Interfaces   Two  layers:   •  SAL  -­‐  Service  Access  Library   –  Basic  serializaHon  and  error  handling   –  REST  or  POJO’s  defined  by  data  provider   •  ESL  -­‐  Extended  Service  Library   –  Caching,  conveniences   –  Can  combine  several  SALs   –  Exposes  faceted  type  system  (described  later)   –  Interface  defined  by  data  consumer  in  many  cases  
  • 22. Service  InteracHon  PaAern   Sample  Swimlane  Diagram  
  • 23. Service  Architecture  PaAerns   •  Internal  Interfaces  Between  Services   –  Common  paAerns  as  templates   –  Highly  instrumented,  observable,  analyHcs   –  Service  Level  Agreements  –  SLAs     •  Library  templates  for  generic  features   –  Instrumented  Ne#lix  Base  Servlet  template   –  Instrumented  generic  client  interface  template   –  Instrumented  S3,  SimpleDB,  Memcached  clients  
  • 24. CLIENT   Request  Start   Timestamp,   Client   Inbound   Request  End   outbound   deserialize  end   Timestamp   serialize  start   Hmestamp   Hmestamp   Inbound   Client   deserialize   outbound   start   serialize  end   Hmestamp   Hmestamp   Client  network   receive   Hmestamp   Service  Request   Client  Network   send   Hmestamp   Instruments  Every   Service   network  send   Hmestamp   Step  in  the  call   Service   Network   receive   Hmestamp   Service   Service   outbound   inbound   serialize  end   serialize  start   Hmestamp   Hmestamp   Service   Service   outbound   inbound   serialize  start   SERVICE  execute   serialize  end   request  start   Hmestamp   Hmestamp   Hmestamp,   execute  request   end  Hmestamp  
  • 25. Boundary  Interfaces   •  Isolate  teams  from  external  dependencies   –  Fake  SAL  built  by  cloud  team   –  Real  SAL  provided  by  data  provider  team  later   –  ESL  built  by  cloud  team  using  faceted  objects   •  Fake  data  sources  allow  development  to  start   –  e.g.  Fake  IdenHty  SAL  for  a  test  set  of  customers   –  Development  solidifies  dependencies  early   –  Helps  external  team  provide  the  right  interface  
  • 26. One  Object  That  Does  Everything   •  Datacenter  uses  a  few  big  complex  objects   –  Movie  and  Customer  objects  are  the  foundaHon   –  Good  choice  for  a  small  team  and  one  instance   –  ProblemaHc  for  large  teams  and  many  instances   •  False  sharing  causes  tangled  dependencies   –  UnproducHve  re-­‐integraHon  work     An(-­‐pa,ern  impac(ng  produc(vity  and  availability  
  • 27. An  Interface  For  Each  Component   •  Cloud  uses  faceted  Video  and  Visitor   –  Basic  types  hold  only  the  idenHfier   –  Facets  scope  the  interface  you  actually  need   –  Each  component  can  define  its  own  facets   •  No  false-­‐sharing  and  dependency  chains   –  Type  manager  converts  between  facets  as  needed   –  video.asA(PresentaHonVideo)  for  www   –  video.asA(MerchableVideo)  for  middle  Her  
  • 28. So>ware  Architecture  PaAerns   •  Object  Models   –  Basic  and  derived  types,  facets,  serializable   –  Pass  by  reference  within  a  service   –  Pass  by  value  between  services     •  ComputaHon  and  I/O  Models   –  Service  ExecuHon  using  Best  Effort   –  Common  thread  pool  management  
  • 29. Cloud  OperaHons   Model  Driven  Architecture   Capacity  Planning  &  Monitoring  
  • 30. Tools  and  AutomaHon   •  Developer  and  Build  Tools   –  Jira,  Eclipse,  Jeeves,  Ivy,  ArHfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne#lix  ApplicaHon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producHon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK/SunJDK,  CentOS/AmazonLinux     •  Monitoring  Tools   –  Keynote  –  service  monitoring  and  alerHng   –  AppDynamics  –  Developer  focus  for  cloud  hAp://appdynamics.com   –  EpicNMS  –  flexible  data  collecHon  and  plots  hAp://epicnms.com   –  Nimso>  NMS  –  ITOps  focus  for  Datacenter  +  Cloud  alerHng  
  • 31. Model  Driven  Architecture   •  Datacenter  PracHces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  paAerns   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jeeves  based  builds  for  everything   –  Every  producHon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaHon  is  managed  by  an  Autoscaler   No  excep(ons,  every  change  is  a  new  AMI  
  • 32. Model  Driven  ImplicaHons   •  Automated  “Least  Privilege”  Security   –  Tightly  specified  security  groups   –  Fine  grain  IAM  keys  to  access  AWS  resources   –  Performance  tools  security  and  integraHon   •  Model  Driven  Performance  Monitoring   –  Hundreds  of  instances  appear  in  a  few  minutes…   –  Tools  have  to  “garbage  collect”  dead  instances    
  • 34. Auto  Scale  Group  ConfiguraHon  
  • 35. Capacity  Planning  &  Monitoring  
  • 36. Capacity  Planning  in  Clouds   (a  few  things  have  changed…)   •  Capacity  is  expensive   •  Capacity  takes  Hme  to  buy  and  provision   •  Capacity  only  increases,  can’t  be  shrunk  easily   •  Capacity  comes  in  big  chunks,  paid  up  front   •  Planning  errors  can  cause  big  problems   •  Systems  are  clearly  defined  assets   •  Systems  can  be  instrumented  in  detail   •  Depreciate  assets  over  3  years  (reservaHons!)  
  • 37. Monitoring  Issues   •  Problem   –  Too  many  tools,  each  with  a  good  reason  to  exist   –  Hard  to  get  an  integrated  view  of  a  problem   –  Too  much  manual  work  building  dashboards   –  Tools  are  not  discoverable,  views  are  not  filtered   •  SoluHon   –  Get  vendors  to  add  deep  linking  URLs  and  APIs   –  IntegraHon  “portal”  Hes  everything  together   –  Underlying  dependency  database   –  Dynamic  portal  generaHon,  relevant  data,  all  tools  
  • 38. Data  Sources   • External  URL  availability  and  latency  alerts  and  reports  –  Keynote   External  TesHng   • Stress  tesHng  -­‐  SOASTA   • Ne#lix  REST  calls  –  Chukwa  to  DataOven  with  GUID  transacHon  idenHfier   Request  Trace  Logging   • Generic  HTTP  –  AppDynamics  service  Her  aggregaHon,  end  to  end  tracking   • Tracers  and  counters  –  log4j,  tracer  central,  Chukwa  to  DataOven   ApplicaHon  logging   • Trackid  and  Audit/Debug  logging  –  DataOven,  Appdynamics    GUID  cross  reference   • ApplicaHon  specific  real  Hme  –  Nimso>,  Appdynamics,  Epic   JMX    Metrics   • Service  and  SLA  percenHles  –  Nimso>,  Appdynamics,  Epic,logged  to  DataOven   • Stdout  logs  –  S3  –  DataOven,  Nimso>  alerHng   Tomcat  and  Apache  logs   • Standard  format  Access  and  Error  logs  –  S3  –  DataOven,  Nimso>  AlerHng   • Garbage  CollecHon  –  Nimso>,  Appdynamics   JVM   • Memory  usage,  call  stacks,  resource/call  -­‐  AppDynamics   • system  CPU/Net/RAM/Disk  metrics  –  AppDynamics,  Epic,  Nimso>  AlerHng   Linux   • SNMP  metrics  –  Epic,  Network  flows  -­‐  FasHp   • Load  balancer  traffic  –  Amazon  Cloudwatch,  SimpleDB  usage  stats   AWS   • System  configuraHon    -­‐  CPU  count/speed  and  RAM  size,  overall  usage  -­‐  AWS  
  • 40. Dashboards  Architecture   •  Integrated  Dashboard  View   –  Single  web  page  containing  content  from  many  tools   –  Filtered  to  highlight  most  “interesHng”  data   •  Relevance  Controller   –  Drill  in,  add  and  remove  content  interacHvely   –  Given  an  applicaHon,  alert  or  problem  area,  dynamically   build  a  dashboard  relevant  to  your  role  and  needs   •  Dependency  and  Incident  Model   –  Model  Driven  -­‐  Interrogates  tools  and  AWS  APIs   –  Document  store  to  capture  dependency  tree  and  states  
  • 41. Dashboard  Prototype   (not  everything  is  integrated  yet)  
  • 42. AppDynamics   How  to  look  deep  inside  your  cloud  applicaHons   •  AutomaHc  Monitoring   –  Base  AMI  includes  all  monitoring  tools   –  Outbound  calls  only  –  no  discovery/polling  issues   –  InacHve  instances  removed  a>er  a  few  days     •  Incident  Alarms  (deviaHon  from  baseline)   –  Business  TransacHon  latency  and  error  rate   –  Alarm  thresholds  discover  their  own  baseline   –  Email  contains  URL  to  Incident  Workbench  UI  
  • 43. Using  AppDynamics   (simple  example  from  early  2010)  
  • 44. Assess  Impact  using  AppDynamics   View  actual  call  graph  on  producHon  systems  
  • 45. Monitoring  Summary   •  Broken  datacenter  oriented  tools  is  a  big  problem   •  IntegraHng  many  different  tools   –  They  are  not  designed  to  be  integrated   –  We  have  “persuaded”  vendors  to  add  APIs   •  If  you  can’t  see  deep  inside  your  app,  you’re  L  
  • 47. Next  Few  Years…   •  “System  of  Record”  moves  to  Cloud  (now)   –  Master  copies  of  data  live  only  in  the  cloud,  with  backups   –  Cut  the  datacenter  to  cloud  replicaHon  link   •  InternaHonal  Expansion  –  Global  Clouds  (later  in  2011)   –  Rapid  deployments  to  new  markets   •  Cloud  StandardizaHon?   –  Cloud  features  and  APIs  should  be  a  commodity  not  a  differenHator   –  DifferenHate  on  scale  and  quality  of  service   –  CompeHHon  also  drives  cost  down   –  Higher  resilience  and  scalability     We  would  prefer  to  be  an  insignificant  customer  in  a  giant  cloud  
  • 48. Takeaway     NeAlix  is  path-­‐finding  the  use  of  public  AWS   cloud  to  replace  in-­‐house  IT  for  non-­‐trivial   applica(ons  with  hundreds  of  developers  and   thousands  of  systems.     acockcro>@ne#lix.com   hAp://www.linkedin.com/in/adriancockcro>   @adrianco  #ne#lixcloud  
  • 49. 杭州站 · 2011年10月20日~22日 www.qconhangzhou.com(6月启动) QCon北京站官方网站和资料下载 www.qconbeijing.com