SlideShare uma empresa Scribd logo
1 de 140
Architecting Highly
                       Dependable Cloud
                          Applications


                                        Anna Liu
                                        Len Bass




NICTA Copyright 2012   From imagination to impact
The Land Down Under




NICTA Copyright 2012   From imagination to impact
Sydney




NICTA Copyright 2012   From imagination to impact
About NICTA

National ICT Australia

    • Federal and state funded research
      company established in 2002
    • Largest ICT research resource in
      Australia
    • National impact is an important
      success metric
    • ~700 staff/students working in 5 labs
      across major capital cities
    • 7 university partners                                     NICTA technology is
    • Providing R&D services, knowledge                       in over 1 billion mobile
      transfer to Australian (and global) ICT                                  phones
      industry


                                                                               4
 NICTA Copyright 2012            From imagination to impact
Research Areas at NICTA
Networks                                                     Machine
                             Software                        Learning
                             Systems
         Aruna Seneviratne
                                                                   Bob Williamson
                                          Anna Liu
Computer                                  Gernot Heiser

Vision                                                         Optimisation
      Nick Barnes,
      Richard Hartley
                             Control &
      Peter Corke            Signal                                      Mark Wallace,
                                                                         Sylvie Thiebaux,
                             Processing                                  Toby Walsh


                                                Rob Evans

NICTA Copyright 2012            From imagination to impact                          5
Our team’s mission: help enterprises take full
advantage as software extends into cloud!

                                                                         Cost optimised
 High availability
                                      Onsite/offsite         Hybrid cloud

                                                                          Real-time monitoring
                   Disaster recovery
                                                                                Actionable analytics
          Business continuity

                                                                             Intelligent management
                       Systems resilience


                                        Dynamic                                Elastic
                                                         Real time


                                 High performance
                                                                          Our applied R&D capability
                                                                          spans cloud computing, web, SOA,
                                                                          distributed systems, data management,
                                                                          analytics, performance monitoring, DR,
                                                                          automated reasoning, ontologies, AI…
7
NICTA Copyright 2012                        From imagination to impact
Who are we?
• Anna
• Len




NICTA Copyright 2012   From imagination to impact   8
Who are you?
What would you like from this tutorial?




NICTA Copyright 2012   From imagination to impact   9
Outline
• Introduction
               • Cloud Computing Platforms
               • Nature and causes of outages and down-time
               • Characteristics of Dependability in Cloud
• Achieving high dependability
               •   The importance of stateless components
               •   Techniques to handle performance problems
               •   Techniques to handle availability problems
               •   Techniques to handle security problems
• Case Studies: Netflix, Yuruware
• Conclusions


NICTA Copyright 2012               From imagination to impact   11
Introduction
• intro to the cloud – xxx as a
  service, regions/zones
• What is dependability
• why is dependability a concern in the cloud
• types of dependability and high level problem
  descriptions
        – performance
        – availability
        – Security




NICTA Copyright 2012     From imagination to impact   12
NICTA Copyright 2012   From imagination to impact   13
What is Cloud Computing?

   Cloud computing is a model for enabling convenient, on-
   demand network access to a shared pool of configurable
   computing resources (e.g., networks, servers, storage,
   applications, and services) that can be rapidly
   provisioned and released with minimal management
   effort or service provider interaction.

   This cloud model is composed of five essential
   characteristics, three service models, and four
   deployment models.

              - US National Institute of Standards and Technology
NICTA Copyright 2012          From imagination to impact
Characterising Cloud Computing


                                       Measured
                                        Service




                       Resource                                 Self
                        Pooling         Elasticity             Service




                                        Ubiquitous
                                         Network
                                         Access

NICTA Copyright 2012              From imagination to impact
Five Characteristics – NIST Definition
• On-demand Self-Service
    – A consumer can provision computing capabilities without human
      interaction
• Broad network access
    – Computing capabilities are available over the network and accessed
      through standard mechanisms
• Resource pooling
    – Provider‟s computing resources are pooled to serve multiple consumers
      with different resources dynamically assigned according to consumers‟
      demands
• Rapid elasticity
    – Computing capabilities can be rapidly and elastically provisioned to
      quickly scale out and rapidly released to scale in
• Measured service
      – Resource usage can be monitored, controlled, and reported. Providing
           transparency for both the providertoand consumer
NICTA Copyright 2012               From imagination impact
Leading Provider: Amazon EC2


      Let‟s see how Amazon EC2, a leading commercial cloud, looks




                       I want my cloud!




NICTA Copyright 2012                      From imagination to impact
1. Grab your credit
card and create an
account. (10 min)
Then, access to a
console




                                                                 3. Hit this button
                         2. Select where you
                         want to create your
                         virtual machines
                         (US East, US
                         West, Ireland or
                         Singapore)
  NICTA Copyright 2012              From imagination to impact
4. Select a machine
                                                image
                                                • Many pre-configured
                                                images are available
                                                • You can register your
                                                machine images as well




NICTA Copyright 2012   From imagination to impact
5. Determine the amount of resources to allocate
   • <1.0Ghz CPU + 600MB RAM  0.01 USD/hour
   • 1.0Ghz CPU + 1.7GB RAM  0.04 USD/hour
   • 3.0Ghz x 8 CPUs + 68GB RAM  1.1 USD/hour
   • Copyright can pay Win/SQL Serverimpact
NICTA You 2012               From imagination to license fees in pay-per-hour
6. Define a set of
                                                    access control rules




NICTA Copyright 2012   From imagination to impact
7. Done! (< 5 minutes in total)
                       • You have your virtual machine at
                       ec2-184-74-14-28.us-west-
                       1.compute.amazonaws.com




                          I got my virtual machine!




NICTA Copyright 2012           From imagination to impact
8. Connect to my virtual machine
                                             • Just SSH to the address
                                             • You have a root access!!




                       You‟re in an Amazon Datacenter in CA



             This is my desktop in Sydney
NICTA Copyright 2012              From imagination to impact
If you like Windows, just
  launch a Windows virtual
  machine and remote-desktop
  to it



Connected through
a VPN connection


                        You‟re in an Amazon Datacenter in NV


             This is my desktop in Sydney
NICTA Copyright 2012        From imagination to impact
9. Terminate or hibernate virtual machines
                       when they are not in use
                       • In some systems, we use a script to
                       hibernate virtual machines at 8:00PM
                       • Restart instances in the morning if necessary.
NICTA Copyright 2012
                       It takes justFrom imagination to impact minutes
                                      a couple of
10. Check a bill in real-time
                                         • Hours to run virtual machines
                                         • Network in/out
                                         • VPN
                                         • Disk access
                                         • # of requests made
                                         …




NICTA Copyright 2012   From imagination to impact
Three Service Models – NIST definition
Technology exposed to customers                           Providers


                                   Software
                                  as a Service


                                    Platform
                                  as a Service


                              Infrastructure
                               as a Service


                                   Datacenter
                                  Infrastructure
 NICTA Copyright 2012        From imagination to impact
Three Delivery Models
• Infrastructure as a Service (IaaS)
       – The consumer has control over operating systems,
         storage and deployed applications
• Platform as a Service (PaaS)
       – Consumers can deploy applications created using programming
         languages and tools supported by the provider (e.g., Java Servlet)
       – The provider shields the complexity of its infrastructure
              • Scale up/down, load balancing, replication, disaster recovery,
                database management, …

• Software as a Service (SaaS)
       – Consumers use the provider‟s applications
       – The consumer does not manage the underlying cloud
         infrastructure
NICTA Copyright 2012              From imagination to impact
Leading Provider: Google App Engine


      Let‟s see how Google App Engine, a leading
      commercial PaaS, looks




                       I want my PaaS!




NICTA Copyright 2012                 From imagination to impact
1. Create an account.
(5 min) GAE offers a
large amount of quota
for free




                                                     2. Write an application
                                                     using GAE‟s
                                                     framework
NICTA Copyright 2012    From imagination to impact
3. Deploy your application on
                        GAE!

                          Scale up/down, load
                          balancing, replication, disaster
                          recovery, database
                          management, … many functions
NICTA Copyright 2012
                          are implemented by GAE‟s
                       From imagination to impact
4. Check your resource
                                                    usage (CPU, storage, #
                                                    of API calls, …)
                                                    Pay only when usage
                                                    exceeds the free quota
NICTA Copyright 2012   From imagination to impact
Provider Services - 1
• Consumer is allocated some number of virtual
  machine instances.
        – Number of instances is under the control of the
          consumer
        – Provider allows consumer to set rules for
          “autoscaling”. Automatically creating and removing
          instances
        – When new instance is launched it has
               • Software as specified by either the consumer or the provider
               • Private IP address available only from within cloud. Private IP
                 address exists for life of instance and will not change
               • Public IP address. Addressable from outside the cloud. May
                 change under certain circumstances

NICTA Copyright 2012              From imagination to impact                  33
Provider Services – 2
• Cloud data centers
        – hosted in different geographic regions
        – Cloud provider responsible for physical security
• SLAs from cloud providers are for 99.9%+ up
  time for the cloud. No guarantee for any
  individual instance
• Cloud provider will replicate databases to
  different regions or within a region.




NICTA Copyright 2012       From imagination to impact        34
Questions




NICTA Copyright 2012   From imagination to impact   35
NICTA Copyright 2012   From imagination to impact   36
What is dependability?
• Dependability of a computing system is the
  ability to deliver service that can justifiably be
  trusted.
        – The service delivered by a system is its behaviour as
          it is perceived by its user(s);
        – a user is another system (physical, human) that
          interacts with the former at the service interface.
        – The function of a system is what the system is
          intended for, and is described by the system
          specification.
[ A. Avizienis, J.-C. Laprie and B. Randell: Fundamental Concepts of Dependability.
Research Report No 1145, LAAS-CNRS, April 2001]


NICTA Copyright 2012              From imagination to impact                          37
Parsing the definition
• Dependability is relative
        – “justifiably be trusted”
• May be different users with different
  expectations
• Users can be systems or humans
• Systems may deliver many services and
  dependability may be different for each service




NICTA Copyright 2012         From imagination to impact   38
Dependability subsumes many other
attributes




NICTA Copyright 2012   From imagination to impact   39
Questions




NICTA Copyright 2012   From imagination to impact   40
NICTA Copyright 2012   From imagination to impact   41
Cloud vis a vis private data center
 • Cloud providers remove some of the problems
   of operating a private data center
        Acquisition of physical hardware.
        Hiring/training data center staff
        Physical security
  • Other problems remain basically the same
        Security threats from internet connections
        Separation of production/test environments
        Patch installation
  • Other problems are new or exist in changed
    form
        It is these other problems that we now focus on.

NICTA Copyright 2012           From imagination to impact   42
Cloud Specific Dependability Problems
Failure
       Instance failure
       Data failure/consistency
       Operator error
       Upgrade error
Performance
       Latency of provisioning
       Over/under provisioning
       Latency of communication
Security/privacy
       Credentials and keys
       Multi-tenancy
       Location dependency/governance

Disaster Recovery
NICTA Copyright 2012          From imagination to impact   43
Provisioning
• Consumer or cloud infrastructure can launch or
  delete instance of virtual machine
• When new instance launched it consists of
        – Virtual hardware with public and private IP address
        – Executable image
        – Virtual hard disk
• Provisioning is important both in failure recovery
  and performance




NICTA Copyright 2012       From imagination to impact           44
Elasticity - Over or Under Provisioning
• Elasticity is the defining characteristic of cloud
        – Traditional „scalability‟ or „throughput‟ measures no longer helpful
        – “the ability of software to meet changing capacity demands,
          deploying and releasing relevant necessary resources on-
          demand”
• There is often over or under provisioning




NICTA Copyright 2012            From imagination to impact
NICTA Copyright 2012   From imagination to impact   46
Instance Failure – recognition
• Basic failure recognition mechanism is
  “heartbeat”.
• Instance must periodically show it is still alive
        – Send a message
        – Respond to query
• Must be an entity that is responsible for
  monitoring “aliveness” of instance
        – Entity can be infrastructure
        – Entity can be other portion of the application
        – Entity can be client
• Failed instances are not automatically deleted
NICTA Copyright 2012        From imagination to impact     47
Monitoring for Pending Failure
   • Besides PING…
   • A dashboard of flashing lights
   • Monitoring ongoing CPU, memory utilization,
     disk activities, Network activities
   • Environmental controls, water/coolant flow,
     power and temperature




Akamai’s NOC in Cambridge, Massachusetts
  NICTA Copyright 2012      From imagination to impact   48
State
• An instance can be stateful or stateless
• A stateful instance remembers information from
  one message to another. State can be stored
  either within instance memory or on external
  memory device
• A stateless instance must be sent necessary
  state associated with the message.
• HTTP is a stateless protocol so every message
  must contain information allowing the instance to
  understand the context.
• Recovery process is different for stateful
  instances than for stateless instances.           49
NICTA Copyright 2012   From imagination to impact
Stateful Recovery
• Strategy depends on how much loss of
  computation and events can be tolerated.
• Strategy - 1
        – Checkpoint image periodically
        – On recovery, provision with checkpointed image and
          computation will restart from last checkpoint
        – Any computation and messages between last
          checkpoint and failure will be lost.
        – Assumes no state stored on external device.
• Only for cloud because of checkpointing image


NICTA Copyright 2012      From imagination to impact           50
Stateful Recovery Strategy – 2
• Periodically save important state on persistent
  external device.
• When image is activated, it checks whether any
  state has been saved. If so, it reads that state
  and resumes computation
• Any computation and messages between last
  checkpoint and failure will be lost
• Different with prior strategy is that does not
  assume an image exists and state is explicitly
  checkedpointed by application


NICTA Copyright 2012   From imagination to impact    51
Stateful Recovery Strategy – 3
• Periodically save important state on persistent
  external device
• Log incoming messages on persistent external
  device
• When image is activated, it checks whether any
  state has been saved. If so, it reads that state.
• Activated image then reads log and replays
  activity.
• No computation or messages will be lost unless
  there is failure between message arrival and
  recording that message on log. Acks to client will
  allow client to resend message if necessary.      52
NICTA Copyright 2012   From imagination to impact
Comments on Stateful recovery strategies
• Only strategy 1 (provision with checkpointed
  image) is specific to cloud
• Other strategies apply also to non-cloud
  environments.
• Strategy 3 achieves least data loss since
  messages are logged and replayed upon
  recovery.




NICTA Copyright 2012   From imagination to impact   53
Stateless images
• If instance is stateless then
        – Infrastructure can send any message to any instance
        – Can create new instances for performance or
          reliability reasons.
        – Router/load balancer/controller is responsible for
          getting messages to instances

                                          Cloud

            Clients                                      Servers



                                     Load balancer



NICTA Copyright 2012      From imagination to impact         54
How do messages get to instances?
• Two models
        – Push. Load balancer decides which instance should
          get message
        – Pull. Load balancer maintains queue of messages
          and instances retrieve messages from queue.




NICTA Copyright 2012      From imagination to impact          55
Push Architecture Pattern

                Clients




             Load balancer
                                                          Monitor




                 Servers




NICTA Copyright 2012         From imagination to impact
Push Pattern Description
  Client sends a request (e.g. HTTP message) to
    the app in the cloud.
  Request arrives at a load balancer
  Load balancer forwards request to one of the VMs
  Load balancer uses scheduling strategy to decide
    which VM gets the request, e.g. round robin




NICTA Copyright 2012   From imagination to impact
Monitor
The load balancer knows
       CPU utilization for each VM through monitor
       how many requests each VM has gotten
       Possibly how long it took to service the requests.


The monitor decides (based on rules) when new
  resources are needed




NICTA Copyright 2012       From imagination to impact       58
Failure management within Push Pattern
• Monitor will recognize failure of instance through
  non-responsiveness.
• Load Balancer will not send further messages to
  instance
• Messages currently being processed by failed
  instance are lost
• Client must detect message not processed
  (through timeout) and resend message.




NICTA Copyright 2012   From imagination to impact   59
Pull architecture pattern (aka Producer-
Consumer)
                   Clients




                Load balancer/
                queue manager                                 Monitor




                       Servers




NICTA Copyright 2012             From imagination to impact
Pull architecture description
Each request from the client is application specific
  and typed.

The queue keeps separate queues for each
  application running on the VMs.

A VM requests the next message of a particular
  type (pull) and processes it.
When the VM has processed a message, it
  informs the controller to remove the message
  from the queue.
NICTA Copyright 2012   From imagination to impact
Monitor
The monitor can now see
        how long a request waits in a queue
        the average queue length
This is an indication of the load on the VMs that
  have applications that service requests of that
  type.
Allows better scheduling of messages to VMs.




NICTA Copyright 2012      From imagination to impact   62
Failure Management within Pull Pattern
• Controller knows when message has been
  processed.
• If message is not processed within time
  interval, controller can reassign it.
• Failed instances will not request further
  messages and so take themselves out of
  service.
• It is possible for a failed instance to recover and
  continue processing on a message that has
  been rescheduled so checks must be in place to
  keep a message from being double processed.

NICTA Copyright 2012   From imagination to impact   63
Cleaning up
When instance fails it is not automatically
deallocated
Consumer must deallocate failed instance.
When instance deallocated
        – Public and private IP address available for realloation
        – Possible to tell infrastructure that public IP address is
          to be assigned to replacement instance
• Within AWS charging continues until instance
  deallocated.



NICTA Copyright 2012        From imagination to impact            64
Data Failure
• Data storage can be “ephemeral” or “persistent”
• Ephemeral storage disappears if instance fails
• Persistent storage is maintained by cloud
  provider
        – Replicated automatically
        – Replicas may be geographically separated
• May lead to problems with data consistency




NICTA Copyright 2012      From imagination to impact   65
Data Consistency
• Takes time to replicate data
• Means that different replicas of the data may not
  be instantaneously consistent
• CAP Theorem. Data cannot simultaneously be
        – Consistent
        – Fully available
        – Partitioned (distributed across multiple data stores)
• May take ½ second for data to become
  consistent
• Most cloud providers offer “consistent reads” but
  at a potential cost in latency
NICTA Copyright 2012        From imagination to impact            66
Characterising Eventual Consistency in
Amazon SimpleDB

• The probability to read updated data in SimpleDB in US West
       – An application reads data X (ms) after it has written data
                       Consistent Read      Eventual Consistent

                                                                      • SimpleDB has two
                                                                        read operations
                                                                         – Eventual Consistent
                                                                           Read
                                                                         – Consistent Read
                                                                      • This pattern is
                                                                        consistent
                                                                        regardless of the
                                                                        time of day

                                                                                  67
NICTA Copyright 2012                     From imagination to impact
Operator error
• After trying out something in AWS, may want to
  go back to original state
• Not always that straight-forward:
        – Attaching volume is no problem while the instance is
          running, detaching might be problematic
        – Creating / changing auto-scaling rules has effect on
          number of running instances
               • Cannot terminate additional instances, as the rule would
                 create new ones!
        – Deleted / terminated / released resources are gone!




NICTA Copyright 2012              From imagination to impact                68
Undo for System Operators
                       Administrator




                                 begin-                 do
                                                       do
                                                      do                 rollback
                              transaction




              + commit
              + pseudo-delete




NICTA Copyright 2012                        From imagination to impact              69
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                     rollback
                              transaction




                             Sense cloud                                     Sense cloud
                           resources states                                resources states




                       Undo System

NICTA Copyright 2012                          From imagination to impact                      70
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                            rollback
                              transaction




                             Sense cloud                                        Sense cloud
                           resources states                                   resources states




                                                                Goal
                                                                 Goal      Initial
                                                                            Initial
                                                                state
                                                                state      state
                                                                            state




                       Undo System

NICTA Copyright 2012                          From imagination to impact                         71
Approach
                       Administrator




                                 begin-                   do
                                                         do
                                                        do                            rollback
                              transaction




                             Sense cloud                                        Sense cloud
                           resources states                                   resources states




                                                                Goal
                                                                 Goal      Initial
                                                                            Initial               Set of
                                                                                                  Set of
                                                                state
                                                                state      state
                                                                            state                actions
                                                                                                 actions




                                 Execute         Generate code                         Plan



                       Undo System

NICTA Copyright 2012                          From imagination to impact                                   72
Location of instances
• Amazon divides the cloud into
        – Regions (currently eight)
               •   US – east (Northern Va), west (Oregon, Northern Calif), gov
               •   Asia Pactific – Singapore, Toyko
               •   Europe – Ireland
               •   South America (Sao Paulo)
        – Each region has some number of availability zones.
               • Each availability zone has distinct physical location, power
                 sources
               • Communication
                       – within availability zones is high speed,
                       – across availability zones is lower speed,
                       – across regions is lowest speed

• Availability zones and regions can be exploited
  to improve availability
NICTA Copyright 2012                    From imagination to impact               73
User Visible Failures
• Operator error is largest cause of user visible
  errors in large Internet systems
• Largest cause of operator error is configuration
  errors during upgrade
        – Data may be dated
        – Data is based on a world where monthly updates
          were considered frequent. Updates may be as
          frequent as weekly (Facebook) or even more
          frequently – Jan Bosch talks about “continuous
          deployment”.
        – I have not seen recent data describing sources of
          operator error

NICTA Copyright 2012       From imagination to impact         74
Upgrade Frequency
Upgrades to systems are a very common
occurrence
Upgrade frequency of some common systems
              Application                     Average release interval
              Facebook (platform)             < 7 days
              Google Docs                     <50 days
              Media Wiki                      21 (171 schema updates
                                              in 4.5 years)
              Joomla                          30


This frequency would suggest it is important to get
the updates correct

NICTA Copyright 2012                From imagination to impact           75
Configuration parameters
• Options are extensive
        – Hadoop – 206
        – Cassandra – 36
        – HBase – 64
• Massive numbers of dependencies, many
  hidden
        –   File path
        –   Network address
        –   Dynamically loaded libraries
        –   Database schema
        –   …

NICTA Copyright 2012         From imagination to impact   76
Basic upgrade strategies
• Rolling Upgrade
        – Perform upgrade one node at a time
               • Does not require additional resources
               • Allows for determination of correctness in an incremental
                 fashion
               • Implies that multiple versions may be simultaneously in
                 service
               • Takes time
• Big flip
        – Perform upgrade to a cluster at a time
               • Keep users from accessing cluster until upgrade completed
               • Takes resources out of service until upgrade is completed
• General industrial practice is Rolling Upgrade
NICTA Copyright 2012              From imagination to impact                 77
Potential error condition during rolling
upgrade
 • Multiple versions are simultaneously active
   during rolling upgrade
 • Opens door to errors resulting from version
   incompatibility
 • During a single session a client can deal with
   multiple versions of a single component.
 • May result in “mixed-version” race condition
 • “…these race conditions occur frequently during
   rolling updates of large Internet systems, such
   as Facebook” From “To Upgrade or Not to Upgrade”
NICTA Copyright 2012   From imagination to impact   78
Mixed Version Race Condition
             Client (browser)                                      Server
                                                                       1 Start
                                                                            rolling
                                                                            upgrade
                       2
                                       Initial request
                           HTTP reply with                                  New
                           embedded JavaScript                          3   Version

                       4                 AJAX callback
                                                                            Old
                                                                       5    Version

                                                                     X ERROR




NICTA Copyright 2012                  From imagination to impact                      79
Assumptions/Requirements for a Solution
• Requirements
        – Clients never interact with decreasing versions. i.e.
          once a client interacts with version xxx, it will never
          interact with a version less than xxx.
        – Messages are balanced across all instances of an
          application, whether new or old versions.
• Assumptions
        – Versions are backwards compatible. i.e. any message
          can be processed by the latest version without
          creating mixed-version race condition
        – Client behavior with respect to the versions with
          which it interacts is governed by mobile code sent to
          the browser from the server side.
NICTA Copyright 2012         From imagination to impact             80
Key Ideas of Proposed Solution - 1
• Consider different versions as separate
  endpoints for a message. Each version is
  www.sample.com/<version number>
• Each instance knows its version number.
• Client knows the largest version number with
  which it has interacted.




NICTA Copyright 2012   From imagination to impact   81
Key ideas of Proposed Solution - 2
• Load Balancer portion
        – Use a load balancer that routes messages to different
          endpoints
        – The load balancer is the entry point for messages.
        – Messages with /<version number> in the header are
          routed to an instance greater than or equal than the
          version number according to load balancing algorithm
          for those instances.
        – Messages without version information are routed
          according to normal load balancing
• Load balancers are hierarchical
        – Ensure that top level is updated before used to route
          messages
NICTA Copyright 2012       From imagination to impact             82
NICTA Copyright 2012   From imagination to impact   83
Achieving Elasticity
• Elasticity means the ability to create new (virtual)
  resources on demand
• Providers allow consumer to set up “autoscaling”
  rules. These rules make the demand automatic
  without necessity for operator manual action.
        – E.g. create a new instance when an existing instance
          is utilizing greater than 75% of CPU for more than 5
          minutes.
• Correct strategy for autoscaling is a matter of
  research because of the time it takes to create a
  new instance, provision it, boot it, and start an
  application.
NICTA Copyright 2012       From imagination to impact        84
Provisioning Latency
• Small Instance
        – 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1
          EC2 Compute Unit), 160 GB of instance storage, 32-bit platform
          with a base install of CentOS 5.3 AMI
        – Between 5 and 6 minutes us-east-1c from launch to availability
• Large Instance
        – 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2
          EC2 Compute Units each), 850 GB of instance storage, 64-bit
          platform with a base install of CentOS 5.3 AMI
        – Between 11 and 18 minutes us-east-1c

[http://www.philchen.com/2009/04/21/how-long-does-it-take-to-launch-an-amazon-ec2-
instance]




NICTA Copyright 2012            From imagination to impact                           85
Provisioning Forecasting
• Approaches to predict appropriate number of
  instances
• Technique 1 (due to Sadeka Islam)
        – Calculate cost of having instances that are unused
          (overprovisioning)
        – Calculate cost of having requests go unsatisfied
          (underprovisioning)
        – Allocate additional instances to optimize costs under
          various usage scenarios
• Technique 2 (due to Matthew Sladescu )
        – Sniff out events that might lead to surge in demand
          and use that to predict appropriate number of
          instances
NICTA Copyright 2012       From imagination to impact             86
Latency of Communication
• Measurements by Robin Meehan based on http-
  ping
• Within EU region but across availability zones
        – Roundtrip to local host within cloud (control) avg = 1.0 ms
        – Roundtrip to public IP in same AZ avg = 1.4 ms
• Out of cloud (local England facility) to within
  cloud
        – Us-east = 231 ms
        – Eu-west = 96 ms


http://smart421.wordpress.com/2011/02/15/amazon-web-services-inter-az-latency-
measurements/
http://smart421.wordpress.com/2011/01/17/which-amazon-web-services-region-should-
you-use-for-your-service/
NICTA Copyright 2012           From imagination to impact                           87
NICTA Copyright 2012   From imagination to impact   88
Security topics
• Credentials and keys
• Management of credentials and keys in the
  cloud
• Multi-tenancy
• Location dependency/governance




NICTA Copyright 2012   From imagination to impact   89
Credentials and keys
 • A credential identifies you
        – As an individual
        – As having certain privileges
        – As having certain qualifications
 • Credentials are used in
        – Authentication (you are who you say you are)
        – Authorization (you have the rights to perform certain actions)
        – Non-repudiation (you cannot deny you did something)
 • A key is a magic number used in cryptography
   for
        – Encrypting/decrypting data
        – Digital credentials


NICTA Copyright 2012           From imagination to impact                  90
Basic Data protection


     App outside
                                                           App inside of cloud
       of cloud
                                                                  (data
        (data
                                                        unencrypted, communicati
     unencrypted)      https: data is                        on encrypted)
                       encrypted for transfer
                       into the cloud




                                                  Data is stored
                                                            Data

                                                  encrypted (by vendor)

NICTA Copyright 2012             From imagination to impact                        91
What can go wrong with the Basic Data
Protection?
• Suppose cloud provider has to respond to
  subpoena for data. Your data
  may, potentially, be included.
• Cloud provider must decrypt data to respond to
  subpoena.
• You may wish to encrypt your data (double
  encryption) so that cloud provider can only
  provide encrypted data.
• Of course, if subpoena is directed at you, you
  must comply with decrypted data.
NICTA Copyright 2012   From imagination to impact   92
Use of credentials
•     Log into app in the cloud
•     Attach a disk volume
•     Download application from a non-public location
•     Access particular data bases.



• For non-public applications, protect your
  credentials and your data will be protected.



NICTA Copyright 2012   From imagination to impact   93
Vulnerabilities to Credentials
• Compromised inadvertently through social
  engineering means or carelessness
• Held by disgruntled employee
• Compromised through some sort of attack




NICTA Copyright 2012   From imagination to impact   94
Goals for credential storage
• Easy to do. If it is difficult to store credentials,
  people will avoid their use. A script can
  automate the provisioning of credentials but then
  the script needs to be protected
• Possible to change in a running instance?. Once
  an instance has been launched, can the
  credentials it uses be changed?
• Possible to change for instances launched in the
  future? This issue is related to building
  credentials into scripts. If scripts have
  credentials built in then it makes it difficult to
  change them in the future.
NICTA Copyright 2012   From imagination to impact    95
Options for getting credentials to App in the
cloud
• Send credentials from client outside the cloud
       – HTTPS will negotiate encryption of credentials over the internet
       – Assumes credentials can be kept private on clients that have
         them.
       – Credentials need to be sent every time there is a new instance –
• Pass credentials in as a parameter during
  launch of instance
       – Credentials persist for the life of the instance so if credentials
         change, can re-instantiate instance
       – Means credentials are stored on a server – itself a vulnerability




NICTA Copyright 2012            From imagination to impact                    96
More options for getting credentials to App
server
• Build credentials into the image
        –   App server is instantiated from an image in the image library
        –   Could install credentials in the image when building it
        –   Makes it difficult to change credentials
        –   Prevents reuse of image (or makes reusing image a very bad
            idea)
• Keep credentials in persistent storage.
        – Access control list for persistent storage provides protection
          based on credentials
        – Credentials may be based on a different account




NICTA Copyright 2012            From imagination to impact                  97
Conclusion with respect to credential
management
• No insurmountable problem
• Needs to be thought through
        – Who has access to credentials?
        – Will I ever need to change credentials?




NICTA Copyright 2012       From imagination to impact   98
What is Multi-tenancy?


                       VM for               VM for                VM for
                       customer 1           customer 2            customer 3


                                    Hypervisor




  Server


  Local Network


  Storage                Data        Data                  Data         Data




NICTA Copyright 2012                From imagination to impact                 99
Multi Tenancy Gets More Complicated
                                    End users




                       VM for           VM for                    VM for
                       customer 1       customer 2                customer 3


                                     Hypervisor


NICTA Copyright 2012                 From imagination to impact                100
Multi Tenancy Means “Sharing”
• Consumers share hardware
        – CPU
        – Network
        – Storage media
• Consumers share software
        – Hypervisor
• End users share applications
        – E.g. Salesforce.com




NICTA Copyright 2012      From imagination to impact   101
What are the problems with Multi-tenancy?
• Performance – other users or consumers will
  consume resources and, potentially, keep you
  from achieving your performance requirements.
        – Some providers allow consumers to reserve complete
          machines that would prevent multi-tenancy from
          occurring.
• Security – other users could potentially break
  confidentiality or integrity
        – Provider uses isolation for security. Consumer must
          have trust in provider
        – Consumer uses encryption to protect data.


NICTA Copyright 2012       From imagination to impact           102
Isolation assumptions
• Virtual machines are isolated based on virtual
  memory technology and addressing scheme
        – Processor manufacturers have specialized hardware
          to support virtualization
        – Hypervisor introduces a new layer of privileged
          software that could be attacked.
• Hypervisors provide facilities to isolate networks.
• Disk isolation is the same as in a non-cloud
  environment. OSs or shared software provide
  facilities.


NICTA Copyright 2012      From imagination to impact          103
Personally Identifiable Information
• Personally identifiable (US NIST)
        – Information which can be used to distinguish or trace an
          individual's identity, such as their name, social security number,
          biometric records, etc. alone, or when combined with other
          personal or identifying information which is linked or linkable to a
          specific individual, such as date and place of birth, mother’s
          maiden name, etc.
• Personal data (EU)
        – ‘personal data' shall mean any information relating to an
          identified or identifiable natural person ('data subject'); an
          identifiable person is one who can be identified, directly or
          indirectly, in particular by reference to an identification number or
          to one or more factors specific to his physical, physiological,
          mental, economic, cultural or social identity


NICTA Copyright 2012            From imagination to impact                   104
Location dependency/governance
• Some jurisdictions require that personal
  information for their jurisdiction is not stored
  outside of the jurisdiction
        – The EU requires that personal information can leave
          the EU only for locations that have equivalent privacy
          guarantees
        – Australia has a similar policy
        – “If offshore cloud compromises your data, we‟ll sue
          you, not them”, Victoria Privacy Commissioner
• Some jurisdictions claim rights to access any
  data stored within their borders
        – US Patriot Act gives US government right to examine
          any data stored in the US.
NICTA Copyright 2012       From imagination to impact          105
What does this mean in the cloud?
• Knowing location of data centers
        – Amazon provides locations of their data centers
        – Google does not
• Does this mean just use Amazon data center in
  region compliant with your requirements?
        – Not so fast!
        – Back up locations may be chosen by provider. Could
          be anywhere
        – A complicated problem is to control back up location
          based on data content.
• Amazon does have a gov region that almost
  certainly complies with US government
  regulations
NICTA Copyright 2012       From imagination to impact            106
Use tokens as a replacement for PII
• A token is an identifier that has no mathematical
  mapping to the individual being identified
        – E.g. number people in tutorial arbitrarily
        – Your number becomes a unique identifier for your PII
          stored in the cloud
        – I keep mapping between you and your token privately
          according to jurisdictional laws




NICTA Copyright 2012      From imagination to impact         107
Example of token use
• Original data
        – John Doe
        – Sensitive information
• Token table (kept locally to conform to privacy
  laws)
        – John Doe
        – Token for John Doe
• Data stored in cloud
        – Token
        – Sensitive information
• Take join of token table and data table in cloud
  and the original data is restored
NICTA Copyright 2012           From imagination to impact   108
How about jurisdictional problem?
• Tokens
        – Technique for decoupling PII from identifier.
        – Adds a level of indirection and protects that level
          locally
• Does this solve jurisdictional problems?
        – I don‟t know
        – PerspecSys says it does
        “http://www.perspecsys.com/how-we-help/data-residency/”




NICTA Copyright 2012          From imagination to impact          109
Questions




NICTA Copyright 2012   From imagination to impact   110
NICTA Copyright 2012   From imagination to impact   111
Netflix Corporation
• Launched in 1998 after founder was irritated at
  having to pay late fees on a DVD rental.
• DVD Model
        – Pay monthly membership fee that includes
          rentals, shipping and no late fees
        – Maintain online queue of desired rentals
        – When return last rental (depending on service
          plan), next item in queue is mailed to you together
          with a return envelope.
• Customers rate movies and Netflix recommends
  based on your preferences

NICTA Copyright 2012       From imagination to impact
Streaming video - 1
• Streaming video service introduced in 2008
• Customers can watch Netflix streaming video on
  a wide variety of devices many of which feed
  into a TV
        –   Roku set top box
        –   Blu-ray disk platers
        –   Xbox 360
        –   TV directly
        –   PlayStation 3
        –   …
• Customers can stop and restart video at will.
  Netflix calls these locations in the films
  “bookmarks”.
NICTA Copyright 2012          From imagination to impact
Streaming video - 2
• Initially, one hour of streaming video was
  available to customers for every dollar they
  spent on their plan
• In Jan, 2008, every customer was entitled to
  unlimited streaming video.
• In Nov, 2011 Netflix changed billing model to
  have separate charges for DVDs and streaming




NICTA Copyright 2012   From imagination to impact
Internet statistics
• In May, 2011, Netflix streaming video accounted
  for 22% of all internet traffic. 30% of traffic during
  peak usage hours.

• Three bandwidth tiers
        – Continuous bandwidth to the client of 5 Mbit/s. HDTV, surround
          sound
        – Continuous bandwidth to the client of 3Mbit/s – better than DVD
        – Continuous bandwidth to the client of 1.5Mbit/s – DVD quality




NICTA Copyright 2012           From imagination to impact               115
Netflix‟s move to the cloud
• In late 2008, Netflix had a single data center with
  Oracle as the main database system.
• With the growth of subscriptions and streaming
  video, it was clear that they would soon outgrow
  the data center.
• Two options:
        – Build more data centers
        – Use the cloud
• Netflix choose Amazon EC2 platform



NICTA Copyright 2012      From imagination to impact
Why EC2?
• Four reasons cited by Netflix for moving to the
  cloud
  1. Every layer of the software stack needed to scale horizontally, be
     more reliable, redundant, and fault tolerant. This leads to reason #2
  2. Outsourcing data center infrastructure to Amazon allowed Netflix
     engineers to focus on building and improving their business.
  3. Netflix is not very good at predicting customer growth or device
     engagement. They underestimated their growth rate. The cloud
     supports rapid scaling.
  4. Cloud computing is the future. This will help Netflix with recruiting
     engineers who are interested in honing their skills, and will help
     scale the business. It will also ensure competition among cloud
     providers helping to keep costs down.
• Why Amazon and EC2? In 2008, Amazon was
  the leading supplier. Netflix wanted an IaaS so
  they could focus on their core competencies.
NICTA Copyright 2012         From imagination to impact
Netflix applications
Video ratings, reviews, and recommendations
Video streaming
User registration, log-in
Video queues
Billing
DVD disc management – inventory and shipping
Video metadata management – movie cast
  information



NICTA Copyright 2012   From imagination to impact
Netflix Reliability
• Deep service
  dependency
  hierarchy
• 1 billion incoming
  calls/day
• Across 1000s of
  instances
• Intermittent failure
  guaranteed



NICTA Copyright 2012   From imagination to impact   119
Approach to detecting faults
• Fast network timeouts and
  retries
• Separate threads on per-
  dependency thread pools
• Semaphores instead of
  threads for services that do
  not perform network calls
• Circuit breaker
   – Service calls are
     decorated with code to
     test whether service is
     failing too often


NICTA Copyright 2012   From imagination to impact   120
If failure detected
• Custom fallback
        – Each service has specific fallback plan
• Fail silent
        – Service returns a null value and invoking service
          knows it has failed
• API should be able to show what is happening
  now, in real time, not from some past time.
  Dashboard shown to operator has
  red/yellow/green lights for important services



NICTA Copyright 2012       From imagination to impact         121
Netflix test suite - 1

  • Netflix has a variety of test programs they call
    the Simian Army. These programs include
         – Chaos monkey. Randomly kill a process and monitor the effect.
         – Latency monkey. Randomly introduce latency and monitor the
           effect.
         – Doctor monkey. The Doctor Monkey taps into health checks that
           run on each instance as well as monitors other external signs of
           health (e.g. CPU load) to detect unhealthy instances.
         – Janitor Monkey. The Janitor Monkey ensures that the Netflix
           cloud environment is running free of clutter and waste. It
           searches for unused resources and disposes of them.




NICTA Copyright 2012           From imagination to impact
Netflix test suite - 2
        – Conformity Monkey. The Conformity Monkey finds instances that
          don‟t adhere to best-practices and shuts them down. For
          example, if an instance does not belong to an auto-scaling
          group, that is a potential problem.
        – Security Monkey The Security Monkey is an extension of
          Conformity Monkey. It finds security violations or vulnerabilities,
          such as improperly configured AWS security groups, and
          terminates the offending instances. It also ensures that all our
          SSL and DRM certificates are valid and are not coming up for
          renewal.
        – 10-18 Monkey The 10-18 Monkey (Localization-
          Internationalization) detects configuration and run time problems
          in instances serving customers in multiple geographic regions,
          using different languages and character sets. The name 10-18
          comes from L10n and I18n which are the number of characters
          in the words localization and internationalization.

NICTA Copyright 2012            From imagination to impact
Performance
• Create new auto-scaling group for each new
  version of code
        – Copy entire configuration to new group
        – Test behaviour under load by squeezing traffic in
          production to a smaller set of servers or generating
          artificial load against a single server




NICTA Copyright 2012       From imagination to impact            124
SmugMug
• Photo sharing site
• Survived April AWS outage
• Recommendations
        –   Spread across as many availability zones as possible
        –   Spread across regions if possible
        –   Build for failure (like Chaos Monkey)
        –   Understand how components fail (yours and cloud
            providers services)




NICTA Copyright 2012        From imagination to impact         125
Others
• Bizo
        – Use circuit breakers. Assume services will fail, cache
          data and monitor extensively to detect failure.
• SimpleGeo
        – share nothing, redundancy, automated failover,
          automated replication
• Twilio
        – Unit of failure is a single host
               • Simple services, replicatable
        – Short timeouts and quick retries
        – Idempotent service interfaces (stateless)
        – Relax consistency requirements
NICTA Copyright 2012              From imagination to impact   126
NICTA Copyright 2012   From imagination to impact   127
Enterprise DR under pressure?
Issues…                                                                         Good DR is only affordable for a
 DR requirement is growing, driven by (a) changing                             few applications
  customer expectations, and associated reputational
  risks; (b) Government & industry regulations
 Infrastructure for DR is expensive: sophisticated DR                                                Good DR
  is only affordable for a small % of applications;                                                   coverage




                                                                       Higher priority applications
  forces compromises/prioritisation
 Confidence in initiating a recovery often less than it                                                    Limited
  should be (too long, too much loss), uncertain                                                            coverage
  integrity
 DR Solutions often too „local‟, insufficiently resilient
 Enterprise IT becoming more complex

                                                                                                                       No
                                                                                                                       cover
Cost of DR is increasing…
 Improving business continuity (BC) and DR is 2nd
  highest priority for enterprises for 2010/2011
 BC/DR typically claims 6-7% of total IT budget
 32% of enterprises plan to increase spending on
  BC/DR by at least 5% in 2010/2011.                                 Hypothesis: We can use cloud
   Forrester global survey 2,803 IT decision-makers, Sept
                            2010
                                                                     to extend DR at 1/10th cost.
                128
NICTA Copyright 2012                          From imagination to impact
Using Cloud for Business Continuity
• Two main usages of cloud for Business Continuity:
        – Provides highly available systems for day-to-day business
        – Serves as a technology platform to implement disaster recovery
• Some definitions:
        – Business Continuity: “Activity performed by an organisation to
          ensure that critical business functions will be available to
          customers, suppliers, regulators and other entities…”
        – Disaster Recovery: “A small subset of business continuity. The
          process, policies and procedures related to preparing for
          recovery or continuation of technology infrastructure critical to an
          organisation after a natural or human-induced disaster”
        – Fault Tolerance: “The property that enables a system to
          continue operating properly, possibly at a reduced quality
          level…”

                                                                    129
NICTA Copyright 2012            From imagination to impact
Building Highly Reliable Systems with Cloud
• Must address potential failures at two levels:
        – Hardware/Infrastructure
               • To prevent Single-Point-of-Failure (SPOF) by adding
                 redundancy in all hardware components (i.e., redundant
                 disks, redundant network devices, redundant power supply,
                 etc.)
               • NOT all cloud providers provide 100% availability. Check
                 your SLA!!
        – Application
               • Prepare fail-over system to take over in case of a failure
               • Database replicates to minimise downtime and loss of data
               • Replicate to geographically different location (e.g., to avoid
                 natural disasters such as floods)


                                                                        130
NICTA Copyright 2012               From imagination to impact
DR As A Service – Requirements
• Cost Effective DR-As-A-Service is essential to
  get the DR solution deployed
• Deep architectural expertise does not exist in
  many businesses
• Needs solutions that achieves dependability that
  is
               •   Non intrusive at runtime
               •   Does not require changes to application architecture
               •   Works across platforms
               •   Cheaper and easier to use than current state of practice




NICTA Copyright 2012                From imagination to impact                131
Case Study: Building Reliable System using EC2

• Highly replicated
                                                                     Minimum Size= 1
  architecture of cloud              Elastic IP address
                                      xxx.xxx.xxx.xxx
                                                                     Availability Zones = A, B, C

  makes them great as                                                     Auto Scaling Rule
                                                       Create
  foundations for business            Allocate



  continuity solutions
• Globally distributed                           EC2 Instance


                                           Availability Zone A           Availability Zone B              Availability Zone C
  nature further enhances
  the disaster recovery                     Minimum Size= 2
                                            Availability Zones = A, B, C
  capability of cloud
                                                  Auto Scaling Rule                Request from Clients      Availability Zones

• Availability limitations                                              Elastic Load Balancer
                                                                                                                 = A, B, C



  means need to be                                 Forward Request



  realistic about Hot vs
  Warm vs Cold standby                              EC2 Instance                                          EC2 Instance


                                                 Availability Zone A     Availability Zone B      Availability Zone C
  options
 NICTA Copyright 2012   From imagination to impact                                                                         132
Case Study: Building Reliable System using EC2 (Contd)

• Data backup in AWS
        – Amazon S3 is best for off-site data backup
               • Stores large binary files
               • Designed to provide 99.999999999% durability
               • Objects are redundantly stored in multiple facilities in a
                 Region
        – Back up using EBS
               • Uses a regular file system
               • Takes image (or snapshot) of the partition
        – VM Import
               • Allows for easy replication from on-premise to cloud
               • Not trivial to replicate various configuration such as network
                 configuration and disk drives

                                                                        133
NICTA Copyright 2012               From imagination to impact
The Business Opportunity
                           “always-on” costs in
                        cloud. Also, very hot one
  Cost                        is not feasible
               Hot                      Warm Standby                 Cold Standby
             Standby


       • Run
                                                                     • Ship backup to
       transactions on
                                        • Regularly                  offsite
       multiple sites but
                                        backup app/data              • Hardware is not
       use only one
                                        in a backup site             already set up
       • Mirror data via
                                        • Launch systems             • Recover
       dedicated high
                                        upon a disaster              systems after
       speed network
                                                                     disaster
       (e.g., SANs)
                                                                                Traditional DR
          Cost of warm
           and cold is                                                Cloud DR
          comparable

   seconds            minutes – few          hours – few                days – weeks Downtime
(auto failover)           hours                  days                 (large data loss)
                      (auto failover,
   NICTA Copyright 2012                        (manual
                                        From imagination to impact                       134
                  minimum data loss)     failover, few data
Yuruware Bolt




NICTA Copyright 2012   From imagination to impact   135
Questions




NICTA Copyright 2012   From imagination to impact   136
Conclusions
• Cloud Computing brings unique dependability
  challenges
               • Latency across the global links
               • Full automation means faster than ever error propagation
               • Multi-tenancy issues
• Many traditional dependability patterns would
  work, but need some new techniques in the
  Cloud-era
               • Traditional Patterns: stateless, etc
               • Upgrade, undo/redo
               • Simian armies, DR-As-A-Service



NICTA Copyright 2012               From imagination to impact               137
References
• How to keep your AWS credentials on an EC2 Instance Securely,
  Shlomo Swidler, http://shlomoswidler.com/2009/08/how-to-keep-
  your-aws-credentials-on-ec2.html
• http://techblog.netflix.com/
• Cloud Performance Benchmark Series, Network Performance:
  Rackspace.com, Sumit, Sanghrajka, Radu Sion,
  http://www.cs.stonybrook.edu/~sion/research/sion2011cloud-
  net2.pdf
• How long does it take to launch an Amazon EC2 instance, Phil
  Chen, http://www.philchen.com/2009/04/21/how-long-does-it-take-
  to-launch-an-amazon-ec2-instance
• Basic Concepts and Taxonomy of Dependable and Secure
  Computing, Avizienis, Laprie, Randell, Landwehr, IEEE
  Transactions on Dependable and Secure Computing, Vol 1, No 1,
  Jan-March 2004


NICTA Copyright 2012     From imagination to impact
References - 2
• Cloud Software Updates: Challenges and Opportunies, Neamtiu,
  Dumitras,
  http://www.ece.cmu.edu/~tdumitra/public_documents/neamtiu11clou
  dupgrades11.pdf
• To upgrade or not to Upgrade, Dumitras, Narasimhan, Tilevich,
  Onward! 2010
• Cloud Application Architectures, George Reese, O‟Reilly, 2009
• Why do internet services fail and what can be done about it?
  Oppenheimer, et al. Usenix Symposium on Internet Technologies
  and Systems, 2003
• Data Consistency properties and the trade-offs in commercial cloud
  storages: the consumers‟ perspectives, Wada, et al. 5th Biennial
  conference on Innovative Data Systems Research, CiDR, 2011
  http://www.nicta.com.au/pub?id=4341



NICTA Copyright 2012      From imagination to impact              139
References - 3
• Why do upgrades fail and what can we do about it? Tudor Dumitras
  and Priya Narasimhan. 2009. Why do upgrades fail and what can
  we do about it? Proceedings of the ACM/IFIP/USENIX 10th
  international conference on Middleware (Middleware'09)
• Using Program Analysis to Reduce Misconfiguration in Open Source
  Systems Software, Ariel Rabkin, PhD thesis, Univ of
  Calif, Berkeley, 2012
• A method for preventing mixed version race conditions, Bass, Wada
  https://docs.google.com/open?id=0ByLr8SO1MsAiaXVxcmNNcDhV
  czg, 2012
• Automatic Undo for Cloud Management via AI Planning, Ingo
  Weber, Hiroshi Wada, Alan Fekete, Anna Liu, Len
  Bass, Proceedings of the 12th Hot Topics in System Dependability
  http://www.nicta.com.au/pub?id=5994



NICTA Copyright 2012     From imagination to impact              140
References - 4
• How a consumer can measure elasticity for cloud platforms, Sadeka
  Islam, Kevin Lee, Alan Fekete, Anna Liu, Proceedings of the 3rd
  Joint WOSP/SIPEW International Conference on Performance
  Engineering, p.85-96, 2012
• Empirical prediction models for adaptive resource provisioning in the
  cloud, Sadeka Islam, Jacky Keung, Kevin Lee, Anna Liu, Future
  Generation Computer Systems, Vol 28, No.1, p.155-162, 2012




NICTA Copyright 2012       From imagination to impact                141
Q&A


                       Thank You!


Research study opportunities in dependable cloud computing:
• Software Architecture
• Data Management
• Performance Engineering
• Autonomic Computing

 To find out more, send your CV and undergraduate details to
                    students@nicta.com.au
NICTA Copyright 2012   From imagination to impact        142

Mais conteúdo relacionado

Mais procurados

Accenture - Innovation at Work
Accenture - Innovation at WorkAccenture - Innovation at Work
Accenture - Innovation at WorkRobert Casselman
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...ptaglephd
 
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1Ruud Ramakers
 
CL100.pdf
CL100.pdfCL100.pdf
CL100.pdfNovell
 
Yorktel Capabilities Briefing 2012
Yorktel Capabilities Briefing  2012Yorktel Capabilities Briefing  2012
Yorktel Capabilities Briefing 2012RolandNC1
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industriesdirkbeth
 
Increase your it agility and cost efficiency with hds cloud solutions webinar
Increase your it agility and cost efficiency with hds cloud solutions webinarIncrease your it agility and cost efficiency with hds cloud solutions webinar
Increase your it agility and cost efficiency with hds cloud solutions webinarHitachi Vantara
 
Is Your IT Infrastructure Future-Proof?
Is Your IT Infrastructure Future-Proof? Is Your IT Infrastructure Future-Proof?
Is Your IT Infrastructure Future-Proof? Internap
 
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...JAX London
 
Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 finalInternap
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Internap
 
Simply Connected Solution Brief
Simply Connected Solution BriefSimply Connected Solution Brief
Simply Connected Solution BriefJuniper Networks
 
Meru Retailer Presentation 18 October 2006
Meru Retailer Presentation 18 October 2006Meru Retailer Presentation 18 October 2006
Meru Retailer Presentation 18 October 2006Meru Networks
 
Asset Vision Datasheet
Asset Vision DatasheetAsset Vision Datasheet
Asset Vision Datasheetrzambrana1
 
Ci oezine v2
Ci oezine v2Ci oezine v2
Ci oezine v2hoyinc
 
Dell Healthcare ISV EMEA Alliance Program
Dell Healthcare ISV EMEA Alliance ProgramDell Healthcare ISV EMEA Alliance Program
Dell Healthcare ISV EMEA Alliance ProgramEric Van 't Hoff
 
Running SagePFW in a Private Cloud
Running SagePFW in a Private CloudRunning SagePFW in a Private Cloud
Running SagePFW in a Private CloudVertical Solutions
 

Mais procurados (19)

Accenture - Innovation at Work
Accenture - Innovation at WorkAccenture - Innovation at Work
Accenture - Innovation at Work
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
 
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1
Cloudcomputing Nivo Consultancy 26 Mei 2009 Versie 1
 
CL100.pdf
CL100.pdfCL100.pdf
CL100.pdf
 
Yorktel Capabilities Briefing 2012
Yorktel Capabilities Briefing  2012Yorktel Capabilities Briefing  2012
Yorktel Capabilities Briefing 2012
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industries
 
Increase your it agility and cost efficiency with hds cloud solutions webinar
Increase your it agility and cost efficiency with hds cloud solutions webinarIncrease your it agility and cost efficiency with hds cloud solutions webinar
Increase your it agility and cost efficiency with hds cloud solutions webinar
 
Is Your IT Infrastructure Future-Proof?
Is Your IT Infrastructure Future-Proof? Is Your IT Infrastructure Future-Proof?
Is Your IT Infrastructure Future-Proof?
 
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
 
Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 final
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
 
Simply Connected Solution Brief
Simply Connected Solution BriefSimply Connected Solution Brief
Simply Connected Solution Brief
 
Käyttäjien omat laitteet ja ohjelmistot työpaikoilla - Esa Aho 31.5.2012
Käyttäjien omat laitteet ja ohjelmistot työpaikoilla - Esa Aho 31.5.2012Käyttäjien omat laitteet ja ohjelmistot työpaikoilla - Esa Aho 31.5.2012
Käyttäjien omat laitteet ja ohjelmistot työpaikoilla - Esa Aho 31.5.2012
 
Meru Retailer Presentation 18 October 2006
Meru Retailer Presentation 18 October 2006Meru Retailer Presentation 18 October 2006
Meru Retailer Presentation 18 October 2006
 
Asset Vision Datasheet
Asset Vision DatasheetAsset Vision Datasheet
Asset Vision Datasheet
 
Ci oezine v2
Ci oezine v2Ci oezine v2
Ci oezine v2
 
Alta 3-2013
Alta 3-2013Alta 3-2013
Alta 3-2013
 
Dell Healthcare ISV EMEA Alliance Program
Dell Healthcare ISV EMEA Alliance ProgramDell Healthcare ISV EMEA Alliance Program
Dell Healthcare ISV EMEA Alliance Program
 
Running SagePFW in a Private Cloud
Running SagePFW in a Private CloudRunning SagePFW in a Private Cloud
Running SagePFW in a Private Cloud
 

Destaque

Presentation on pharmaceutical and synthetic application of Hofmann reactions.
Presentation on pharmaceutical and synthetic application of Hofmann reactions.Presentation on pharmaceutical and synthetic application of Hofmann reactions.
Presentation on pharmaceutical and synthetic application of Hofmann reactions.Md. Sohanur Rahaman
 
Validity & reability for teaching strategy
Validity & reability for teaching strategyValidity & reability for teaching strategy
Validity & reability for teaching strategykristianti89
 
Favorskii rearrangement
Favorskii rearrangementFavorskii rearrangement
Favorskii rearrangementSagar Divetiya
 
Using analogies to teach english language learners
Using analogies to teach english language learnersUsing analogies to teach english language learners
Using analogies to teach english language learnersEika Matari
 
Analogies notes
Analogies notesAnalogies notes
Analogies notesbac15215
 
Analogy type of test.pptx ( new )
Analogy type of   test.pptx ( new )Analogy type of   test.pptx ( new )
Analogy type of test.pptx ( new )Azer Puz
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011photomatt
 

Destaque (8)

Presentation on pharmaceutical and synthetic application of Hofmann reactions.
Presentation on pharmaceutical and synthetic application of Hofmann reactions.Presentation on pharmaceutical and synthetic application of Hofmann reactions.
Presentation on pharmaceutical and synthetic application of Hofmann reactions.
 
Validity & reability for teaching strategy
Validity & reability for teaching strategyValidity & reability for teaching strategy
Validity & reability for teaching strategy
 
Favorskii rearrangement
Favorskii rearrangementFavorskii rearrangement
Favorskii rearrangement
 
Using analogies to teach english language learners
Using analogies to teach english language learnersUsing analogies to teach english language learners
Using analogies to teach english language learners
 
Analogies notes
Analogies notesAnalogies notes
Analogies notes
 
Analogy type of test.pptx ( new )
Analogy type of   test.pptx ( new )Analogy type of   test.pptx ( new )
Analogy type of test.pptx ( new )
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011
 

Semelhante a WICSA 2012 tutorial

Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote addressthreesixty
 
Enabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsEnabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsPankesh Patel
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetTom_Webb
 
Presentación Novedades vSphere 5.1
Presentación Novedades vSphere 5.1Presentación Novedades vSphere 5.1
Presentación Novedades vSphere 5.1Omega Peripherals
 
Application-Aware Network Performance Management
Application-Aware Network Performance ManagementApplication-Aware Network Performance Management
Application-Aware Network Performance ManagementRiverbed Technology
 
Identity Insights: Social, Local and Mobile Identity
Identity Insights: Social, Local and Mobile IdentityIdentity Insights: Social, Local and Mobile Identity
Identity Insights: Social, Local and Mobile IdentityJon Bultmeyer
 
Unleash Business Innovation with the Next Generation of Cloud Computing
Unleash Business Innovation with the Next Generation of Cloud ComputingUnleash Business Innovation with the Next Generation of Cloud Computing
Unleash Business Innovation with the Next Generation of Cloud ComputingSam Garforth
 
When where why cloud
When where why cloudWhen where why cloud
When where why cloudreshmaroberts
 
When Where Why Cloud
When Where Why CloudWhen Where Why Cloud
When Where Why Cloudreshmaroberts
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudGlobal Business Events
 
Day 2 p2 - business services management
Day 2   p2 - business services managementDay 2   p2 - business services management
Day 2 p2 - business services managementLilian Schaffer
 
Trend micro - Your journey to the cloud, where are you
Trend micro - Your journey to the cloud, where are youTrend micro - Your journey to the cloud, where are you
Trend micro - Your journey to the cloud, where are youGlobal Business Events
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsetsresponsedatacomms
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsetsresponsedatacomms
 
Cloud conference & expo presentation
Cloud conference & expo presentationCloud conference & expo presentation
Cloud conference & expo presentationTelstra
 
Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Hilde Lovett
 
OpenStack- The Time is Now - Lew Tucker, Cisco
OpenStack- The Time is Now - Lew Tucker, CiscoOpenStack- The Time is Now - Lew Tucker, Cisco
OpenStack- The Time is Now - Lew Tucker, Ciscoramdurairaj
 

Semelhante a WICSA 2012 tutorial (20)

Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote address
 
Enabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of ThingsEnabling High Level Application Development In The Internet Of Things
Enabling High Level Application Development In The Internet Of Things
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends Rpaquet
 
Presentación Novedades vSphere 5.1
Presentación Novedades vSphere 5.1Presentación Novedades vSphere 5.1
Presentación Novedades vSphere 5.1
 
Going to the Cloud
Going to the Cloud Going to the Cloud
Going to the Cloud
 
Application-Aware Network Performance Management
Application-Aware Network Performance ManagementApplication-Aware Network Performance Management
Application-Aware Network Performance Management
 
The SDN Opportunity
The SDN OpportunityThe SDN Opportunity
The SDN Opportunity
 
Identity Insights: Social, Local and Mobile Identity
Identity Insights: Social, Local and Mobile IdentityIdentity Insights: Social, Local and Mobile Identity
Identity Insights: Social, Local and Mobile Identity
 
Recasting The Net
Recasting The NetRecasting The Net
Recasting The Net
 
Unleash Business Innovation with the Next Generation of Cloud Computing
Unleash Business Innovation with the Next Generation of Cloud ComputingUnleash Business Innovation with the Next Generation of Cloud Computing
Unleash Business Innovation with the Next Generation of Cloud Computing
 
When where why cloud
When where why cloudWhen where why cloud
When where why cloud
 
When Where Why Cloud
When Where Why CloudWhen Where Why Cloud
When Where Why Cloud
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your Cloud
 
Day 2 p2 - business services management
Day 2   p2 - business services managementDay 2   p2 - business services management
Day 2 p2 - business services management
 
Trend micro - Your journey to the cloud, where are you
Trend micro - Your journey to the cloud, where are youTrend micro - Your journey to the cloud, where are you
Trend micro - Your journey to the cloud, where are you
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
 
Cloud conference & expo presentation
Cloud conference & expo presentationCloud conference & expo presentation
Cloud conference & expo presentation
 
Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009Lovett introducing cloud computing nov 2009
Lovett introducing cloud computing nov 2009
 
OpenStack- The Time is Now - Lew Tucker, Cisco
OpenStack- The Time is Now - Lew Tucker, CiscoOpenStack- The Time is Now - Lew Tucker, Cisco
OpenStack- The Time is Now - Lew Tucker, Cisco
 

Mais de Len Bass

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabusLen Bass
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020Len Bass
 
11 secure development
11  secure development 11  secure development
11 secure development Len Bass
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery Len Bass
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction Len Bass
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline Len Bass
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management Len Bass
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architectureLen Bass
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure securityLen Bass
 
4 container management
4  container management4  container management
4 container managementLen Bass
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud Len Bass
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machinesLen Bass
 
2 networking
2 networking2 networking
2 networkingLen Bass
 
Quantum talk
Quantum talkQuantum talk
Quantum talkLen Bass
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorialLen Bass
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devopsLen Bass
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchainsLen Bass
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchainLen Bass
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systemsLen Bass
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipelineLen Bass
 

Mais de Len Bass (20)

Devops syllabus
Devops syllabusDevops syllabus
Devops syllabus
 
DevOps Syllabus summer 2020
DevOps Syllabus summer 2020DevOps Syllabus summer 2020
DevOps Syllabus summer 2020
 
11 secure development
11  secure development 11  secure development
11 secure development
 
10 disaster recovery
10 disaster recovery  10 disaster recovery
10 disaster recovery
 
9 postproduction
9 postproduction 9 postproduction
9 postproduction
 
8 pipeline
8 pipeline 8 pipeline
8 pipeline
 
7 configuration management
7 configuration management 7 configuration management
7 configuration management
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
 
5 infrastructure security
5 infrastructure security5 infrastructure security
5 infrastructure security
 
4 container management
4  container management4  container management
4 container management
 
3 the cloud
3 the cloud 3 the cloud
3 the cloud
 
1 virtual machines
1 virtual machines1 virtual machines
1 virtual machines
 
2 networking
2 networking2 networking
2 networking
 
Quantum talk
Quantum talkQuantum talk
Quantum talk
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorial
 
Experience in teaching devops
Experience in teaching devopsExperience in teaching devops
Experience in teaching devops
 
Understanding blockchains
Understanding blockchainsUnderstanding blockchains
Understanding blockchains
 
What is a blockchain
What is a blockchainWhat is a blockchain
What is a blockchain
 
Dev ops and safety critical systems
Dev ops and safety critical systemsDev ops and safety critical systems
Dev ops and safety critical systems
 
My first deployment pipeline
My first deployment pipelineMy first deployment pipeline
My first deployment pipeline
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

WICSA 2012 tutorial

  • 1. Architecting Highly Dependable Cloud Applications Anna Liu Len Bass NICTA Copyright 2012 From imagination to impact
  • 2. The Land Down Under NICTA Copyright 2012 From imagination to impact
  • 3. Sydney NICTA Copyright 2012 From imagination to impact
  • 4. About NICTA National ICT Australia • Federal and state funded research company established in 2002 • Largest ICT research resource in Australia • National impact is an important success metric • ~700 staff/students working in 5 labs across major capital cities • 7 university partners NICTA technology is • Providing R&D services, knowledge in over 1 billion mobile transfer to Australian (and global) ICT phones industry 4 NICTA Copyright 2012 From imagination to impact
  • 5. Research Areas at NICTA Networks Machine Software Learning Systems Aruna Seneviratne Bob Williamson Anna Liu Computer Gernot Heiser Vision Optimisation Nick Barnes, Richard Hartley Control & Peter Corke Signal Mark Wallace, Sylvie Thiebaux, Processing Toby Walsh Rob Evans NICTA Copyright 2012 From imagination to impact 5
  • 6. Our team’s mission: help enterprises take full advantage as software extends into cloud! Cost optimised High availability Onsite/offsite Hybrid cloud Real-time monitoring Disaster recovery Actionable analytics Business continuity Intelligent management Systems resilience Dynamic Elastic Real time High performance Our applied R&D capability spans cloud computing, web, SOA, distributed systems, data management, analytics, performance monitoring, DR, automated reasoning, ontologies, AI… 7 NICTA Copyright 2012 From imagination to impact
  • 7. Who are we? • Anna • Len NICTA Copyright 2012 From imagination to impact 8
  • 8. Who are you? What would you like from this tutorial? NICTA Copyright 2012 From imagination to impact 9
  • 9. Outline • Introduction • Cloud Computing Platforms • Nature and causes of outages and down-time • Characteristics of Dependability in Cloud • Achieving high dependability • The importance of stateless components • Techniques to handle performance problems • Techniques to handle availability problems • Techniques to handle security problems • Case Studies: Netflix, Yuruware • Conclusions NICTA Copyright 2012 From imagination to impact 11
  • 10. Introduction • intro to the cloud – xxx as a service, regions/zones • What is dependability • why is dependability a concern in the cloud • types of dependability and high level problem descriptions – performance – availability – Security NICTA Copyright 2012 From imagination to impact 12
  • 11. NICTA Copyright 2012 From imagination to impact 13
  • 12. What is Cloud Computing? Cloud computing is a model for enabling convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models. - US National Institute of Standards and Technology NICTA Copyright 2012 From imagination to impact
  • 13. Characterising Cloud Computing Measured Service Resource Self Pooling Elasticity Service Ubiquitous Network Access NICTA Copyright 2012 From imagination to impact
  • 14. Five Characteristics – NIST Definition • On-demand Self-Service – A consumer can provision computing capabilities without human interaction • Broad network access – Computing capabilities are available over the network and accessed through standard mechanisms • Resource pooling – Provider‟s computing resources are pooled to serve multiple consumers with different resources dynamically assigned according to consumers‟ demands • Rapid elasticity – Computing capabilities can be rapidly and elastically provisioned to quickly scale out and rapidly released to scale in • Measured service – Resource usage can be monitored, controlled, and reported. Providing transparency for both the providertoand consumer NICTA Copyright 2012 From imagination impact
  • 15. Leading Provider: Amazon EC2 Let‟s see how Amazon EC2, a leading commercial cloud, looks I want my cloud! NICTA Copyright 2012 From imagination to impact
  • 16. 1. Grab your credit card and create an account. (10 min) Then, access to a console 3. Hit this button 2. Select where you want to create your virtual machines (US East, US West, Ireland or Singapore) NICTA Copyright 2012 From imagination to impact
  • 17. 4. Select a machine image • Many pre-configured images are available • You can register your machine images as well NICTA Copyright 2012 From imagination to impact
  • 18. 5. Determine the amount of resources to allocate • <1.0Ghz CPU + 600MB RAM  0.01 USD/hour • 1.0Ghz CPU + 1.7GB RAM  0.04 USD/hour • 3.0Ghz x 8 CPUs + 68GB RAM  1.1 USD/hour • Copyright can pay Win/SQL Serverimpact NICTA You 2012 From imagination to license fees in pay-per-hour
  • 19. 6. Define a set of access control rules NICTA Copyright 2012 From imagination to impact
  • 20. 7. Done! (< 5 minutes in total) • You have your virtual machine at ec2-184-74-14-28.us-west- 1.compute.amazonaws.com I got my virtual machine! NICTA Copyright 2012 From imagination to impact
  • 21. 8. Connect to my virtual machine • Just SSH to the address • You have a root access!! You‟re in an Amazon Datacenter in CA This is my desktop in Sydney NICTA Copyright 2012 From imagination to impact
  • 22. If you like Windows, just launch a Windows virtual machine and remote-desktop to it Connected through a VPN connection You‟re in an Amazon Datacenter in NV This is my desktop in Sydney NICTA Copyright 2012 From imagination to impact
  • 23. 9. Terminate or hibernate virtual machines when they are not in use • In some systems, we use a script to hibernate virtual machines at 8:00PM • Restart instances in the morning if necessary. NICTA Copyright 2012 It takes justFrom imagination to impact minutes a couple of
  • 24. 10. Check a bill in real-time • Hours to run virtual machines • Network in/out • VPN • Disk access • # of requests made … NICTA Copyright 2012 From imagination to impact
  • 25. Three Service Models – NIST definition Technology exposed to customers Providers Software as a Service Platform as a Service Infrastructure as a Service Datacenter Infrastructure NICTA Copyright 2012 From imagination to impact
  • 26. Three Delivery Models • Infrastructure as a Service (IaaS) – The consumer has control over operating systems, storage and deployed applications • Platform as a Service (PaaS) – Consumers can deploy applications created using programming languages and tools supported by the provider (e.g., Java Servlet) – The provider shields the complexity of its infrastructure • Scale up/down, load balancing, replication, disaster recovery, database management, … • Software as a Service (SaaS) – Consumers use the provider‟s applications – The consumer does not manage the underlying cloud infrastructure NICTA Copyright 2012 From imagination to impact
  • 27. Leading Provider: Google App Engine Let‟s see how Google App Engine, a leading commercial PaaS, looks I want my PaaS! NICTA Copyright 2012 From imagination to impact
  • 28. 1. Create an account. (5 min) GAE offers a large amount of quota for free 2. Write an application using GAE‟s framework NICTA Copyright 2012 From imagination to impact
  • 29. 3. Deploy your application on GAE! Scale up/down, load balancing, replication, disaster recovery, database management, … many functions NICTA Copyright 2012 are implemented by GAE‟s From imagination to impact
  • 30. 4. Check your resource usage (CPU, storage, # of API calls, …) Pay only when usage exceeds the free quota NICTA Copyright 2012 From imagination to impact
  • 31. Provider Services - 1 • Consumer is allocated some number of virtual machine instances. – Number of instances is under the control of the consumer – Provider allows consumer to set rules for “autoscaling”. Automatically creating and removing instances – When new instance is launched it has • Software as specified by either the consumer or the provider • Private IP address available only from within cloud. Private IP address exists for life of instance and will not change • Public IP address. Addressable from outside the cloud. May change under certain circumstances NICTA Copyright 2012 From imagination to impact 33
  • 32. Provider Services – 2 • Cloud data centers – hosted in different geographic regions – Cloud provider responsible for physical security • SLAs from cloud providers are for 99.9%+ up time for the cloud. No guarantee for any individual instance • Cloud provider will replicate databases to different regions or within a region. NICTA Copyright 2012 From imagination to impact 34
  • 33. Questions NICTA Copyright 2012 From imagination to impact 35
  • 34. NICTA Copyright 2012 From imagination to impact 36
  • 35. What is dependability? • Dependability of a computing system is the ability to deliver service that can justifiably be trusted. – The service delivered by a system is its behaviour as it is perceived by its user(s); – a user is another system (physical, human) that interacts with the former at the service interface. – The function of a system is what the system is intended for, and is described by the system specification. [ A. Avizienis, J.-C. Laprie and B. Randell: Fundamental Concepts of Dependability. Research Report No 1145, LAAS-CNRS, April 2001] NICTA Copyright 2012 From imagination to impact 37
  • 36. Parsing the definition • Dependability is relative – “justifiably be trusted” • May be different users with different expectations • Users can be systems or humans • Systems may deliver many services and dependability may be different for each service NICTA Copyright 2012 From imagination to impact 38
  • 37. Dependability subsumes many other attributes NICTA Copyright 2012 From imagination to impact 39
  • 38. Questions NICTA Copyright 2012 From imagination to impact 40
  • 39. NICTA Copyright 2012 From imagination to impact 41
  • 40. Cloud vis a vis private data center • Cloud providers remove some of the problems of operating a private data center Acquisition of physical hardware. Hiring/training data center staff Physical security • Other problems remain basically the same Security threats from internet connections Separation of production/test environments Patch installation • Other problems are new or exist in changed form It is these other problems that we now focus on. NICTA Copyright 2012 From imagination to impact 42
  • 41. Cloud Specific Dependability Problems Failure Instance failure Data failure/consistency Operator error Upgrade error Performance Latency of provisioning Over/under provisioning Latency of communication Security/privacy Credentials and keys Multi-tenancy Location dependency/governance Disaster Recovery NICTA Copyright 2012 From imagination to impact 43
  • 42. Provisioning • Consumer or cloud infrastructure can launch or delete instance of virtual machine • When new instance launched it consists of – Virtual hardware with public and private IP address – Executable image – Virtual hard disk • Provisioning is important both in failure recovery and performance NICTA Copyright 2012 From imagination to impact 44
  • 43. Elasticity - Over or Under Provisioning • Elasticity is the defining characteristic of cloud – Traditional „scalability‟ or „throughput‟ measures no longer helpful – “the ability of software to meet changing capacity demands, deploying and releasing relevant necessary resources on- demand” • There is often over or under provisioning NICTA Copyright 2012 From imagination to impact
  • 44. NICTA Copyright 2012 From imagination to impact 46
  • 45. Instance Failure – recognition • Basic failure recognition mechanism is “heartbeat”. • Instance must periodically show it is still alive – Send a message – Respond to query • Must be an entity that is responsible for monitoring “aliveness” of instance – Entity can be infrastructure – Entity can be other portion of the application – Entity can be client • Failed instances are not automatically deleted NICTA Copyright 2012 From imagination to impact 47
  • 46. Monitoring for Pending Failure • Besides PING… • A dashboard of flashing lights • Monitoring ongoing CPU, memory utilization, disk activities, Network activities • Environmental controls, water/coolant flow, power and temperature Akamai’s NOC in Cambridge, Massachusetts NICTA Copyright 2012 From imagination to impact 48
  • 47. State • An instance can be stateful or stateless • A stateful instance remembers information from one message to another. State can be stored either within instance memory or on external memory device • A stateless instance must be sent necessary state associated with the message. • HTTP is a stateless protocol so every message must contain information allowing the instance to understand the context. • Recovery process is different for stateful instances than for stateless instances. 49 NICTA Copyright 2012 From imagination to impact
  • 48. Stateful Recovery • Strategy depends on how much loss of computation and events can be tolerated. • Strategy - 1 – Checkpoint image periodically – On recovery, provision with checkpointed image and computation will restart from last checkpoint – Any computation and messages between last checkpoint and failure will be lost. – Assumes no state stored on external device. • Only for cloud because of checkpointing image NICTA Copyright 2012 From imagination to impact 50
  • 49. Stateful Recovery Strategy – 2 • Periodically save important state on persistent external device. • When image is activated, it checks whether any state has been saved. If so, it reads that state and resumes computation • Any computation and messages between last checkpoint and failure will be lost • Different with prior strategy is that does not assume an image exists and state is explicitly checkedpointed by application NICTA Copyright 2012 From imagination to impact 51
  • 50. Stateful Recovery Strategy – 3 • Periodically save important state on persistent external device • Log incoming messages on persistent external device • When image is activated, it checks whether any state has been saved. If so, it reads that state. • Activated image then reads log and replays activity. • No computation or messages will be lost unless there is failure between message arrival and recording that message on log. Acks to client will allow client to resend message if necessary. 52 NICTA Copyright 2012 From imagination to impact
  • 51. Comments on Stateful recovery strategies • Only strategy 1 (provision with checkpointed image) is specific to cloud • Other strategies apply also to non-cloud environments. • Strategy 3 achieves least data loss since messages are logged and replayed upon recovery. NICTA Copyright 2012 From imagination to impact 53
  • 52. Stateless images • If instance is stateless then – Infrastructure can send any message to any instance – Can create new instances for performance or reliability reasons. – Router/load balancer/controller is responsible for getting messages to instances Cloud Clients Servers Load balancer NICTA Copyright 2012 From imagination to impact 54
  • 53. How do messages get to instances? • Two models – Push. Load balancer decides which instance should get message – Pull. Load balancer maintains queue of messages and instances retrieve messages from queue. NICTA Copyright 2012 From imagination to impact 55
  • 54. Push Architecture Pattern Clients Load balancer Monitor Servers NICTA Copyright 2012 From imagination to impact
  • 55. Push Pattern Description Client sends a request (e.g. HTTP message) to the app in the cloud. Request arrives at a load balancer Load balancer forwards request to one of the VMs Load balancer uses scheduling strategy to decide which VM gets the request, e.g. round robin NICTA Copyright 2012 From imagination to impact
  • 56. Monitor The load balancer knows CPU utilization for each VM through monitor how many requests each VM has gotten Possibly how long it took to service the requests. The monitor decides (based on rules) when new resources are needed NICTA Copyright 2012 From imagination to impact 58
  • 57. Failure management within Push Pattern • Monitor will recognize failure of instance through non-responsiveness. • Load Balancer will not send further messages to instance • Messages currently being processed by failed instance are lost • Client must detect message not processed (through timeout) and resend message. NICTA Copyright 2012 From imagination to impact 59
  • 58. Pull architecture pattern (aka Producer- Consumer) Clients Load balancer/ queue manager Monitor Servers NICTA Copyright 2012 From imagination to impact
  • 59. Pull architecture description Each request from the client is application specific and typed. The queue keeps separate queues for each application running on the VMs. A VM requests the next message of a particular type (pull) and processes it. When the VM has processed a message, it informs the controller to remove the message from the queue. NICTA Copyright 2012 From imagination to impact
  • 60. Monitor The monitor can now see how long a request waits in a queue the average queue length This is an indication of the load on the VMs that have applications that service requests of that type. Allows better scheduling of messages to VMs. NICTA Copyright 2012 From imagination to impact 62
  • 61. Failure Management within Pull Pattern • Controller knows when message has been processed. • If message is not processed within time interval, controller can reassign it. • Failed instances will not request further messages and so take themselves out of service. • It is possible for a failed instance to recover and continue processing on a message that has been rescheduled so checks must be in place to keep a message from being double processed. NICTA Copyright 2012 From imagination to impact 63
  • 62. Cleaning up When instance fails it is not automatically deallocated Consumer must deallocate failed instance. When instance deallocated – Public and private IP address available for realloation – Possible to tell infrastructure that public IP address is to be assigned to replacement instance • Within AWS charging continues until instance deallocated. NICTA Copyright 2012 From imagination to impact 64
  • 63. Data Failure • Data storage can be “ephemeral” or “persistent” • Ephemeral storage disappears if instance fails • Persistent storage is maintained by cloud provider – Replicated automatically – Replicas may be geographically separated • May lead to problems with data consistency NICTA Copyright 2012 From imagination to impact 65
  • 64. Data Consistency • Takes time to replicate data • Means that different replicas of the data may not be instantaneously consistent • CAP Theorem. Data cannot simultaneously be – Consistent – Fully available – Partitioned (distributed across multiple data stores) • May take ½ second for data to become consistent • Most cloud providers offer “consistent reads” but at a potential cost in latency NICTA Copyright 2012 From imagination to impact 66
  • 65. Characterising Eventual Consistency in Amazon SimpleDB • The probability to read updated data in SimpleDB in US West – An application reads data X (ms) after it has written data Consistent Read Eventual Consistent • SimpleDB has two read operations – Eventual Consistent Read – Consistent Read • This pattern is consistent regardless of the time of day 67 NICTA Copyright 2012 From imagination to impact
  • 66. Operator error • After trying out something in AWS, may want to go back to original state • Not always that straight-forward: – Attaching volume is no problem while the instance is running, detaching might be problematic – Creating / changing auto-scaling rules has effect on number of running instances • Cannot terminate additional instances, as the rule would create new ones! – Deleted / terminated / released resources are gone! NICTA Copyright 2012 From imagination to impact 68
  • 67. Undo for System Operators Administrator begin- do do do rollback transaction + commit + pseudo-delete NICTA Copyright 2012 From imagination to impact 69
  • 68. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Undo System NICTA Copyright 2012 From imagination to impact 70
  • 69. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial state state state state Undo System NICTA Copyright 2012 From imagination to impact 71
  • 70. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial Set of Set of state state state state actions actions Execute Generate code Plan Undo System NICTA Copyright 2012 From imagination to impact 72
  • 71. Location of instances • Amazon divides the cloud into – Regions (currently eight) • US – east (Northern Va), west (Oregon, Northern Calif), gov • Asia Pactific – Singapore, Toyko • Europe – Ireland • South America (Sao Paulo) – Each region has some number of availability zones. • Each availability zone has distinct physical location, power sources • Communication – within availability zones is high speed, – across availability zones is lower speed, – across regions is lowest speed • Availability zones and regions can be exploited to improve availability NICTA Copyright 2012 From imagination to impact 73
  • 72. User Visible Failures • Operator error is largest cause of user visible errors in large Internet systems • Largest cause of operator error is configuration errors during upgrade – Data may be dated – Data is based on a world where monthly updates were considered frequent. Updates may be as frequent as weekly (Facebook) or even more frequently – Jan Bosch talks about “continuous deployment”. – I have not seen recent data describing sources of operator error NICTA Copyright 2012 From imagination to impact 74
  • 73. Upgrade Frequency Upgrades to systems are a very common occurrence Upgrade frequency of some common systems Application Average release interval Facebook (platform) < 7 days Google Docs <50 days Media Wiki 21 (171 schema updates in 4.5 years) Joomla 30 This frequency would suggest it is important to get the updates correct NICTA Copyright 2012 From imagination to impact 75
  • 74. Configuration parameters • Options are extensive – Hadoop – 206 – Cassandra – 36 – HBase – 64 • Massive numbers of dependencies, many hidden – File path – Network address – Dynamically loaded libraries – Database schema – … NICTA Copyright 2012 From imagination to impact 76
  • 75. Basic upgrade strategies • Rolling Upgrade – Perform upgrade one node at a time • Does not require additional resources • Allows for determination of correctness in an incremental fashion • Implies that multiple versions may be simultaneously in service • Takes time • Big flip – Perform upgrade to a cluster at a time • Keep users from accessing cluster until upgrade completed • Takes resources out of service until upgrade is completed • General industrial practice is Rolling Upgrade NICTA Copyright 2012 From imagination to impact 77
  • 76. Potential error condition during rolling upgrade • Multiple versions are simultaneously active during rolling upgrade • Opens door to errors resulting from version incompatibility • During a single session a client can deal with multiple versions of a single component. • May result in “mixed-version” race condition • “…these race conditions occur frequently during rolling updates of large Internet systems, such as Facebook” From “To Upgrade or Not to Upgrade” NICTA Copyright 2012 From imagination to impact 78
  • 77. Mixed Version Race Condition Client (browser) Server 1 Start rolling upgrade 2 Initial request HTTP reply with New embedded JavaScript 3 Version 4 AJAX callback Old 5 Version X ERROR NICTA Copyright 2012 From imagination to impact 79
  • 78. Assumptions/Requirements for a Solution • Requirements – Clients never interact with decreasing versions. i.e. once a client interacts with version xxx, it will never interact with a version less than xxx. – Messages are balanced across all instances of an application, whether new or old versions. • Assumptions – Versions are backwards compatible. i.e. any message can be processed by the latest version without creating mixed-version race condition – Client behavior with respect to the versions with which it interacts is governed by mobile code sent to the browser from the server side. NICTA Copyright 2012 From imagination to impact 80
  • 79. Key Ideas of Proposed Solution - 1 • Consider different versions as separate endpoints for a message. Each version is www.sample.com/<version number> • Each instance knows its version number. • Client knows the largest version number with which it has interacted. NICTA Copyright 2012 From imagination to impact 81
  • 80. Key ideas of Proposed Solution - 2 • Load Balancer portion – Use a load balancer that routes messages to different endpoints – The load balancer is the entry point for messages. – Messages with /<version number> in the header are routed to an instance greater than or equal than the version number according to load balancing algorithm for those instances. – Messages without version information are routed according to normal load balancing • Load balancers are hierarchical – Ensure that top level is updated before used to route messages NICTA Copyright 2012 From imagination to impact 82
  • 81. NICTA Copyright 2012 From imagination to impact 83
  • 82. Achieving Elasticity • Elasticity means the ability to create new (virtual) resources on demand • Providers allow consumer to set up “autoscaling” rules. These rules make the demand automatic without necessity for operator manual action. – E.g. create a new instance when an existing instance is utilizing greater than 75% of CPU for more than 5 minutes. • Correct strategy for autoscaling is a matter of research because of the time it takes to create a new instance, provision it, boot it, and start an application. NICTA Copyright 2012 From imagination to impact 84
  • 83. Provisioning Latency • Small Instance – 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform with a base install of CentOS 5.3 AMI – Between 5 and 6 minutes us-east-1c from launch to availability • Large Instance – 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform with a base install of CentOS 5.3 AMI – Between 11 and 18 minutes us-east-1c [http://www.philchen.com/2009/04/21/how-long-does-it-take-to-launch-an-amazon-ec2- instance] NICTA Copyright 2012 From imagination to impact 85
  • 84. Provisioning Forecasting • Approaches to predict appropriate number of instances • Technique 1 (due to Sadeka Islam) – Calculate cost of having instances that are unused (overprovisioning) – Calculate cost of having requests go unsatisfied (underprovisioning) – Allocate additional instances to optimize costs under various usage scenarios • Technique 2 (due to Matthew Sladescu ) – Sniff out events that might lead to surge in demand and use that to predict appropriate number of instances NICTA Copyright 2012 From imagination to impact 86
  • 85. Latency of Communication • Measurements by Robin Meehan based on http- ping • Within EU region but across availability zones – Roundtrip to local host within cloud (control) avg = 1.0 ms – Roundtrip to public IP in same AZ avg = 1.4 ms • Out of cloud (local England facility) to within cloud – Us-east = 231 ms – Eu-west = 96 ms http://smart421.wordpress.com/2011/02/15/amazon-web-services-inter-az-latency- measurements/ http://smart421.wordpress.com/2011/01/17/which-amazon-web-services-region-should- you-use-for-your-service/ NICTA Copyright 2012 From imagination to impact 87
  • 86. NICTA Copyright 2012 From imagination to impact 88
  • 87. Security topics • Credentials and keys • Management of credentials and keys in the cloud • Multi-tenancy • Location dependency/governance NICTA Copyright 2012 From imagination to impact 89
  • 88. Credentials and keys • A credential identifies you – As an individual – As having certain privileges – As having certain qualifications • Credentials are used in – Authentication (you are who you say you are) – Authorization (you have the rights to perform certain actions) – Non-repudiation (you cannot deny you did something) • A key is a magic number used in cryptography for – Encrypting/decrypting data – Digital credentials NICTA Copyright 2012 From imagination to impact 90
  • 89. Basic Data protection App outside App inside of cloud of cloud (data (data unencrypted, communicati unencrypted) https: data is on encrypted) encrypted for transfer into the cloud Data is stored Data encrypted (by vendor) NICTA Copyright 2012 From imagination to impact 91
  • 90. What can go wrong with the Basic Data Protection? • Suppose cloud provider has to respond to subpoena for data. Your data may, potentially, be included. • Cloud provider must decrypt data to respond to subpoena. • You may wish to encrypt your data (double encryption) so that cloud provider can only provide encrypted data. • Of course, if subpoena is directed at you, you must comply with decrypted data. NICTA Copyright 2012 From imagination to impact 92
  • 91. Use of credentials • Log into app in the cloud • Attach a disk volume • Download application from a non-public location • Access particular data bases. • For non-public applications, protect your credentials and your data will be protected. NICTA Copyright 2012 From imagination to impact 93
  • 92. Vulnerabilities to Credentials • Compromised inadvertently through social engineering means or carelessness • Held by disgruntled employee • Compromised through some sort of attack NICTA Copyright 2012 From imagination to impact 94
  • 93. Goals for credential storage • Easy to do. If it is difficult to store credentials, people will avoid their use. A script can automate the provisioning of credentials but then the script needs to be protected • Possible to change in a running instance?. Once an instance has been launched, can the credentials it uses be changed? • Possible to change for instances launched in the future? This issue is related to building credentials into scripts. If scripts have credentials built in then it makes it difficult to change them in the future. NICTA Copyright 2012 From imagination to impact 95
  • 94. Options for getting credentials to App in the cloud • Send credentials from client outside the cloud – HTTPS will negotiate encryption of credentials over the internet – Assumes credentials can be kept private on clients that have them. – Credentials need to be sent every time there is a new instance – • Pass credentials in as a parameter during launch of instance – Credentials persist for the life of the instance so if credentials change, can re-instantiate instance – Means credentials are stored on a server – itself a vulnerability NICTA Copyright 2012 From imagination to impact 96
  • 95. More options for getting credentials to App server • Build credentials into the image – App server is instantiated from an image in the image library – Could install credentials in the image when building it – Makes it difficult to change credentials – Prevents reuse of image (or makes reusing image a very bad idea) • Keep credentials in persistent storage. – Access control list for persistent storage provides protection based on credentials – Credentials may be based on a different account NICTA Copyright 2012 From imagination to impact 97
  • 96. Conclusion with respect to credential management • No insurmountable problem • Needs to be thought through – Who has access to credentials? – Will I ever need to change credentials? NICTA Copyright 2012 From imagination to impact 98
  • 97. What is Multi-tenancy? VM for VM for VM for customer 1 customer 2 customer 3 Hypervisor Server Local Network Storage Data Data Data Data NICTA Copyright 2012 From imagination to impact 99
  • 98. Multi Tenancy Gets More Complicated End users VM for VM for VM for customer 1 customer 2 customer 3 Hypervisor NICTA Copyright 2012 From imagination to impact 100
  • 99. Multi Tenancy Means “Sharing” • Consumers share hardware – CPU – Network – Storage media • Consumers share software – Hypervisor • End users share applications – E.g. Salesforce.com NICTA Copyright 2012 From imagination to impact 101
  • 100. What are the problems with Multi-tenancy? • Performance – other users or consumers will consume resources and, potentially, keep you from achieving your performance requirements. – Some providers allow consumers to reserve complete machines that would prevent multi-tenancy from occurring. • Security – other users could potentially break confidentiality or integrity – Provider uses isolation for security. Consumer must have trust in provider – Consumer uses encryption to protect data. NICTA Copyright 2012 From imagination to impact 102
  • 101. Isolation assumptions • Virtual machines are isolated based on virtual memory technology and addressing scheme – Processor manufacturers have specialized hardware to support virtualization – Hypervisor introduces a new layer of privileged software that could be attacked. • Hypervisors provide facilities to isolate networks. • Disk isolation is the same as in a non-cloud environment. OSs or shared software provide facilities. NICTA Copyright 2012 From imagination to impact 103
  • 102. Personally Identifiable Information • Personally identifiable (US NIST) – Information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc. • Personal data (EU) – ‘personal data' shall mean any information relating to an identified or identifiable natural person ('data subject'); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity NICTA Copyright 2012 From imagination to impact 104
  • 103. Location dependency/governance • Some jurisdictions require that personal information for their jurisdiction is not stored outside of the jurisdiction – The EU requires that personal information can leave the EU only for locations that have equivalent privacy guarantees – Australia has a similar policy – “If offshore cloud compromises your data, we‟ll sue you, not them”, Victoria Privacy Commissioner • Some jurisdictions claim rights to access any data stored within their borders – US Patriot Act gives US government right to examine any data stored in the US. NICTA Copyright 2012 From imagination to impact 105
  • 104. What does this mean in the cloud? • Knowing location of data centers – Amazon provides locations of their data centers – Google does not • Does this mean just use Amazon data center in region compliant with your requirements? – Not so fast! – Back up locations may be chosen by provider. Could be anywhere – A complicated problem is to control back up location based on data content. • Amazon does have a gov region that almost certainly complies with US government regulations NICTA Copyright 2012 From imagination to impact 106
  • 105. Use tokens as a replacement for PII • A token is an identifier that has no mathematical mapping to the individual being identified – E.g. number people in tutorial arbitrarily – Your number becomes a unique identifier for your PII stored in the cloud – I keep mapping between you and your token privately according to jurisdictional laws NICTA Copyright 2012 From imagination to impact 107
  • 106. Example of token use • Original data – John Doe – Sensitive information • Token table (kept locally to conform to privacy laws) – John Doe – Token for John Doe • Data stored in cloud – Token – Sensitive information • Take join of token table and data table in cloud and the original data is restored NICTA Copyright 2012 From imagination to impact 108
  • 107. How about jurisdictional problem? • Tokens – Technique for decoupling PII from identifier. – Adds a level of indirection and protects that level locally • Does this solve jurisdictional problems? – I don‟t know – PerspecSys says it does “http://www.perspecsys.com/how-we-help/data-residency/” NICTA Copyright 2012 From imagination to impact 109
  • 108. Questions NICTA Copyright 2012 From imagination to impact 110
  • 109. NICTA Copyright 2012 From imagination to impact 111
  • 110. Netflix Corporation • Launched in 1998 after founder was irritated at having to pay late fees on a DVD rental. • DVD Model – Pay monthly membership fee that includes rentals, shipping and no late fees – Maintain online queue of desired rentals – When return last rental (depending on service plan), next item in queue is mailed to you together with a return envelope. • Customers rate movies and Netflix recommends based on your preferences NICTA Copyright 2012 From imagination to impact
  • 111. Streaming video - 1 • Streaming video service introduced in 2008 • Customers can watch Netflix streaming video on a wide variety of devices many of which feed into a TV – Roku set top box – Blu-ray disk platers – Xbox 360 – TV directly – PlayStation 3 – … • Customers can stop and restart video at will. Netflix calls these locations in the films “bookmarks”. NICTA Copyright 2012 From imagination to impact
  • 112. Streaming video - 2 • Initially, one hour of streaming video was available to customers for every dollar they spent on their plan • In Jan, 2008, every customer was entitled to unlimited streaming video. • In Nov, 2011 Netflix changed billing model to have separate charges for DVDs and streaming NICTA Copyright 2012 From imagination to impact
  • 113. Internet statistics • In May, 2011, Netflix streaming video accounted for 22% of all internet traffic. 30% of traffic during peak usage hours. • Three bandwidth tiers – Continuous bandwidth to the client of 5 Mbit/s. HDTV, surround sound – Continuous bandwidth to the client of 3Mbit/s – better than DVD – Continuous bandwidth to the client of 1.5Mbit/s – DVD quality NICTA Copyright 2012 From imagination to impact 115
  • 114. Netflix‟s move to the cloud • In late 2008, Netflix had a single data center with Oracle as the main database system. • With the growth of subscriptions and streaming video, it was clear that they would soon outgrow the data center. • Two options: – Build more data centers – Use the cloud • Netflix choose Amazon EC2 platform NICTA Copyright 2012 From imagination to impact
  • 115. Why EC2? • Four reasons cited by Netflix for moving to the cloud 1. Every layer of the software stack needed to scale horizontally, be more reliable, redundant, and fault tolerant. This leads to reason #2 2. Outsourcing data center infrastructure to Amazon allowed Netflix engineers to focus on building and improving their business. 3. Netflix is not very good at predicting customer growth or device engagement. They underestimated their growth rate. The cloud supports rapid scaling. 4. Cloud computing is the future. This will help Netflix with recruiting engineers who are interested in honing their skills, and will help scale the business. It will also ensure competition among cloud providers helping to keep costs down. • Why Amazon and EC2? In 2008, Amazon was the leading supplier. Netflix wanted an IaaS so they could focus on their core competencies. NICTA Copyright 2012 From imagination to impact
  • 116. Netflix applications Video ratings, reviews, and recommendations Video streaming User registration, log-in Video queues Billing DVD disc management – inventory and shipping Video metadata management – movie cast information NICTA Copyright 2012 From imagination to impact
  • 117. Netflix Reliability • Deep service dependency hierarchy • 1 billion incoming calls/day • Across 1000s of instances • Intermittent failure guaranteed NICTA Copyright 2012 From imagination to impact 119
  • 118. Approach to detecting faults • Fast network timeouts and retries • Separate threads on per- dependency thread pools • Semaphores instead of threads for services that do not perform network calls • Circuit breaker – Service calls are decorated with code to test whether service is failing too often NICTA Copyright 2012 From imagination to impact 120
  • 119. If failure detected • Custom fallback – Each service has specific fallback plan • Fail silent – Service returns a null value and invoking service knows it has failed • API should be able to show what is happening now, in real time, not from some past time. Dashboard shown to operator has red/yellow/green lights for important services NICTA Copyright 2012 From imagination to impact 121
  • 120. Netflix test suite - 1 • Netflix has a variety of test programs they call the Simian Army. These programs include – Chaos monkey. Randomly kill a process and monitor the effect. – Latency monkey. Randomly introduce latency and monitor the effect. – Doctor monkey. The Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. – Janitor Monkey. The Janitor Monkey ensures that the Netflix cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them. NICTA Copyright 2012 From imagination to impact
  • 121. Netflix test suite - 2 – Conformity Monkey. The Conformity Monkey finds instances that don‟t adhere to best-practices and shuts them down. For example, if an instance does not belong to an auto-scaling group, that is a potential problem. – Security Monkey The Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal. – 10-18 Monkey The 10-18 Monkey (Localization- Internationalization) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets. The name 10-18 comes from L10n and I18n which are the number of characters in the words localization and internationalization. NICTA Copyright 2012 From imagination to impact
  • 122. Performance • Create new auto-scaling group for each new version of code – Copy entire configuration to new group – Test behaviour under load by squeezing traffic in production to a smaller set of servers or generating artificial load against a single server NICTA Copyright 2012 From imagination to impact 124
  • 123. SmugMug • Photo sharing site • Survived April AWS outage • Recommendations – Spread across as many availability zones as possible – Spread across regions if possible – Build for failure (like Chaos Monkey) – Understand how components fail (yours and cloud providers services) NICTA Copyright 2012 From imagination to impact 125
  • 124. Others • Bizo – Use circuit breakers. Assume services will fail, cache data and monitor extensively to detect failure. • SimpleGeo – share nothing, redundancy, automated failover, automated replication • Twilio – Unit of failure is a single host • Simple services, replicatable – Short timeouts and quick retries – Idempotent service interfaces (stateless) – Relax consistency requirements NICTA Copyright 2012 From imagination to impact 126
  • 125. NICTA Copyright 2012 From imagination to impact 127
  • 126. Enterprise DR under pressure? Issues… Good DR is only affordable for a  DR requirement is growing, driven by (a) changing few applications customer expectations, and associated reputational risks; (b) Government & industry regulations  Infrastructure for DR is expensive: sophisticated DR Good DR is only affordable for a small % of applications; coverage Higher priority applications forces compromises/prioritisation  Confidence in initiating a recovery often less than it Limited should be (too long, too much loss), uncertain coverage integrity  DR Solutions often too „local‟, insufficiently resilient  Enterprise IT becoming more complex No cover Cost of DR is increasing…  Improving business continuity (BC) and DR is 2nd highest priority for enterprises for 2010/2011  BC/DR typically claims 6-7% of total IT budget  32% of enterprises plan to increase spending on BC/DR by at least 5% in 2010/2011. Hypothesis: We can use cloud Forrester global survey 2,803 IT decision-makers, Sept 2010 to extend DR at 1/10th cost. 128 NICTA Copyright 2012 From imagination to impact
  • 127. Using Cloud for Business Continuity • Two main usages of cloud for Business Continuity: – Provides highly available systems for day-to-day business – Serves as a technology platform to implement disaster recovery • Some definitions: – Business Continuity: “Activity performed by an organisation to ensure that critical business functions will be available to customers, suppliers, regulators and other entities…” – Disaster Recovery: “A small subset of business continuity. The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organisation after a natural or human-induced disaster” – Fault Tolerance: “The property that enables a system to continue operating properly, possibly at a reduced quality level…” 129 NICTA Copyright 2012 From imagination to impact
  • 128. Building Highly Reliable Systems with Cloud • Must address potential failures at two levels: – Hardware/Infrastructure • To prevent Single-Point-of-Failure (SPOF) by adding redundancy in all hardware components (i.e., redundant disks, redundant network devices, redundant power supply, etc.) • NOT all cloud providers provide 100% availability. Check your SLA!! – Application • Prepare fail-over system to take over in case of a failure • Database replicates to minimise downtime and loss of data • Replicate to geographically different location (e.g., to avoid natural disasters such as floods) 130 NICTA Copyright 2012 From imagination to impact
  • 129. DR As A Service – Requirements • Cost Effective DR-As-A-Service is essential to get the DR solution deployed • Deep architectural expertise does not exist in many businesses • Needs solutions that achieves dependability that is • Non intrusive at runtime • Does not require changes to application architecture • Works across platforms • Cheaper and easier to use than current state of practice NICTA Copyright 2012 From imagination to impact 131
  • 130. Case Study: Building Reliable System using EC2 • Highly replicated Minimum Size= 1 architecture of cloud Elastic IP address xxx.xxx.xxx.xxx Availability Zones = A, B, C makes them great as Auto Scaling Rule Create foundations for business Allocate continuity solutions • Globally distributed EC2 Instance Availability Zone A Availability Zone B Availability Zone C nature further enhances the disaster recovery Minimum Size= 2 Availability Zones = A, B, C capability of cloud Auto Scaling Rule Request from Clients Availability Zones • Availability limitations Elastic Load Balancer = A, B, C means need to be Forward Request realistic about Hot vs Warm vs Cold standby EC2 Instance EC2 Instance Availability Zone A Availability Zone B Availability Zone C options NICTA Copyright 2012 From imagination to impact 132
  • 131. Case Study: Building Reliable System using EC2 (Contd) • Data backup in AWS – Amazon S3 is best for off-site data backup • Stores large binary files • Designed to provide 99.999999999% durability • Objects are redundantly stored in multiple facilities in a Region – Back up using EBS • Uses a regular file system • Takes image (or snapshot) of the partition – VM Import • Allows for easy replication from on-premise to cloud • Not trivial to replicate various configuration such as network configuration and disk drives 133 NICTA Copyright 2012 From imagination to impact
  • 132. The Business Opportunity “always-on” costs in cloud. Also, very hot one Cost is not feasible Hot Warm Standby Cold Standby Standby • Run • Ship backup to transactions on • Regularly offsite multiple sites but backup app/data • Hardware is not use only one in a backup site already set up • Mirror data via • Launch systems • Recover dedicated high upon a disaster systems after speed network disaster (e.g., SANs) Traditional DR Cost of warm and cold is Cloud DR comparable seconds minutes – few hours – few days – weeks Downtime (auto failover) hours days (large data loss) (auto failover, NICTA Copyright 2012 (manual From imagination to impact 134 minimum data loss) failover, few data
  • 133. Yuruware Bolt NICTA Copyright 2012 From imagination to impact 135
  • 134. Questions NICTA Copyright 2012 From imagination to impact 136
  • 135. Conclusions • Cloud Computing brings unique dependability challenges • Latency across the global links • Full automation means faster than ever error propagation • Multi-tenancy issues • Many traditional dependability patterns would work, but need some new techniques in the Cloud-era • Traditional Patterns: stateless, etc • Upgrade, undo/redo • Simian armies, DR-As-A-Service NICTA Copyright 2012 From imagination to impact 137
  • 136. References • How to keep your AWS credentials on an EC2 Instance Securely, Shlomo Swidler, http://shlomoswidler.com/2009/08/how-to-keep- your-aws-credentials-on-ec2.html • http://techblog.netflix.com/ • Cloud Performance Benchmark Series, Network Performance: Rackspace.com, Sumit, Sanghrajka, Radu Sion, http://www.cs.stonybrook.edu/~sion/research/sion2011cloud- net2.pdf • How long does it take to launch an Amazon EC2 instance, Phil Chen, http://www.philchen.com/2009/04/21/how-long-does-it-take- to-launch-an-amazon-ec2-instance • Basic Concepts and Taxonomy of Dependable and Secure Computing, Avizienis, Laprie, Randell, Landwehr, IEEE Transactions on Dependable and Secure Computing, Vol 1, No 1, Jan-March 2004 NICTA Copyright 2012 From imagination to impact
  • 137. References - 2 • Cloud Software Updates: Challenges and Opportunies, Neamtiu, Dumitras, http://www.ece.cmu.edu/~tdumitra/public_documents/neamtiu11clou dupgrades11.pdf • To upgrade or not to Upgrade, Dumitras, Narasimhan, Tilevich, Onward! 2010 • Cloud Application Architectures, George Reese, O‟Reilly, 2009 • Why do internet services fail and what can be done about it? Oppenheimer, et al. Usenix Symposium on Internet Technologies and Systems, 2003 • Data Consistency properties and the trade-offs in commercial cloud storages: the consumers‟ perspectives, Wada, et al. 5th Biennial conference on Innovative Data Systems Research, CiDR, 2011 http://www.nicta.com.au/pub?id=4341 NICTA Copyright 2012 From imagination to impact 139
  • 138. References - 3 • Why do upgrades fail and what can we do about it? Tudor Dumitras and Priya Narasimhan. 2009. Why do upgrades fail and what can we do about it? Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware (Middleware'09) • Using Program Analysis to Reduce Misconfiguration in Open Source Systems Software, Ariel Rabkin, PhD thesis, Univ of Calif, Berkeley, 2012 • A method for preventing mixed version race conditions, Bass, Wada https://docs.google.com/open?id=0ByLr8SO1MsAiaXVxcmNNcDhV czg, 2012 • Automatic Undo for Cloud Management via AI Planning, Ingo Weber, Hiroshi Wada, Alan Fekete, Anna Liu, Len Bass, Proceedings of the 12th Hot Topics in System Dependability http://www.nicta.com.au/pub?id=5994 NICTA Copyright 2012 From imagination to impact 140
  • 139. References - 4 • How a consumer can measure elasticity for cloud platforms, Sadeka Islam, Kevin Lee, Alan Fekete, Anna Liu, Proceedings of the 3rd Joint WOSP/SIPEW International Conference on Performance Engineering, p.85-96, 2012 • Empirical prediction models for adaptive resource provisioning in the cloud, Sadeka Islam, Jacky Keung, Kevin Lee, Anna Liu, Future Generation Computer Systems, Vol 28, No.1, p.155-162, 2012 NICTA Copyright 2012 From imagination to impact 141
  • 140. Q&A Thank You! Research study opportunities in dependable cloud computing: • Software Architecture • Data Management • Performance Engineering • Autonomic Computing To find out more, send your CV and undergraduate details to students@nicta.com.au NICTA Copyright 2012 From imagination to impact 142

Notas do Editor

  1. Reduce cost, reduce complexity
  2. Need to cut out more words on this slide – just tell the story!!Still need to do good EA, planning, monitoring, governance and managementRisk management approach to security, privacyPlan for Integration with existing assetsCome pick out brains at UNSW/NICTA
  3. NICTA will focus on six research groups of significant scale and focus in which we have genuine opportunity to be ranked in the top five in an area in the world. Research groups have been selected on the basis of current NICTA strengths in research and research leadership. Software Systems. - Software Systems aims to develop game-changing techniques, frameworks and methodologies for the design of integrated, secure, reliable, performant and adaptive software architectures. Software systems has pervasive application in real-world applications ranging from enterprise ecosystems to embedded systems.Networks. - The networks research group will develop new theories, models and methods to support future networked applications andservices. Networked systems will address issues such as radio spectrum scarcity, wired bandwidth abundance, context and content, improvements to computing, energy constraints, and data privacy.Machine Learning. - is the science of interpreting and understanding data. The core problems are jointly statistical and computational. NICTA research will aim to develop machine learning as an engineering discipline, drawing on a spectrum of work from conceptual theory through algorithmics. Machine learning applications will aim to commonalities between problems, developing implementation frameworks that genuinely encourage reuse across different domains.Computer Vision - aims to understand the world through images and video. NICTA will focus on areas including geometry, detection and recognition, optimisation, segmentation, scene understanding, shape/illumination and reflectance, biological inspired approaches and the interfaces between them, drawing from approaches including statistical methods and learning and optimisation. Computer vision is a key enabling research discipline for many applications, including visual surveillance, bionic eye, mapping of the environment and visual surveillance.Control and Signal Processing. - comprises a substantial group of sub-disciplines dealing with optimisation, estimation, detection, identification, behaviour modification, feedback control and stability of a very large class of dynamical systems. It is likely that NICTA will focus on problems of control and signal processing in large-scale decentralised systems which are core to many new ICT systems. Techniques from information theory, Bayesian networks, large scale optimization etc are employed to address this important class of problem.Optimisation - the &quot;science of better&quot;. Research will focus on the interface between constraint programming, operations research, satisfiability, search, automated reasoning, machine learning, simulation and game theory, exploring methods that combine algorithms fromthese different areas. Optimisation applications will address multi-faceted questions such as how best to schedule in a network, whether there is a better folding for a protein, or how best to operate a supply chain.
  4. Also comment on Public vs Private, and need to prepare for HybridRapid Elasticity: Elasticity is defined as the ability to scale resources both up and down as needed. To the consumer, the cloud appears to be infinite, and the consumer can purchase as much or as little computing power as they need. This is one of the essential characteristics of cloud computing in the NIST definition. • Measured Service: In a measured service, aspects of the cloud service are controlled and monitored by the cloud provider. This is crucial for billing, access control, resource optimization, capacity planning and other tasks. • On-Demand Self-Service: The on-demand and self-service aspects of cloud computing mean that a consumer can use cloud services as needed without any human interaction with the cloud provider. • Ubiquitous Network Access: Ubiquitous network access means that the cloud provider’s capabilities are available over the network and can be accessed through standard mechanisms by both thick and thin clients.4 • Resource Pooling: Resource pooling allows a cloud provider to serve its consumers via a multi-tenant model. Physical and virtual resources are assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).5
  5. We have this data from our own studies!! Ping Kevin to get our own reference...
  6. We also have this sort of data ourselves!! From australia obviously!
  7. Where does Amadeus sit?Can we identify a set of apps that’s cold standby now, and can be pushed into warm standby easily/cheaply using cloud?