O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

PyCon Russia 2014 - Auto Scale in the Cloud

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Ansible - Crash course
Ansible - Crash course
Carregando em…3
×

Confira estes a seguir

1 de 80 Anúncio

PyCon Russia 2014 - Auto Scale in the Cloud

Horizontal scaling in the Cloud is the way to adapt resources to load of systems. The Cloud allows users to scale virtually indefinitely, or enough for their needs.
This way the number of servers follows trend of requests, and TCO (Total Cost of Owneship) of IT infrastructure can be reduced. Nonetheless companies can avoid dealing with capacity planning and pre-provisioning issues.

This talk will show how to use Python and Rackspace/OpenStack API and SDK to implement an event-based scaling solution (software released under the open-source Apache License: stay tuned).

Horizontal scaling in the Cloud is the way to adapt resources to load of systems. The Cloud allows users to scale virtually indefinitely, or enough for their needs.
This way the number of servers follows trend of requests, and TCO (Total Cost of Owneship) of IT infrastructure can be reduced. Nonetheless companies can avoid dealing with capacity planning and pre-provisioning issues.

This talk will show how to use Python and Rackspace/OpenStack API and SDK to implement an event-based scaling solution (software released under the open-source Apache License: stay tuned).

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a PyCon Russia 2014 - Auto Scale in the Cloud (20)

Mais recentes (20)

Anúncio

PyCon Russia 2014 - Auto Scale in the Cloud

  1. 1. An introduction Scale in the Cloud Created by: Simone Soldateschi Modified Date: 2014-06-02 Classification: Public Conference
  2. 2. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Who am I? Simone Soldateschi • Java, C/C++, PHP, Python developer • More than 8 years experience as SysAdm/SysEng • Developer Support Engineer at Rackspace • Task automation enthusiast • MTB’ing, triathlon, photo, manga @soldasimo simonesoldateschi 2
  3. 3. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Who are Rackspace? Founded in 1998 in San Antonio, TX by three guys that wanted to create a hosting company Home of Fanatical Support /o/ Second biggest Public Cloud in the world OpenStack Project co-founder
  4. 4. RACKSPACE® HOSTING | WWW.RACKSPACE.COM To be recognized as one of the world’s great service companies. “ ” Rackspace Vision 4
  5. 5. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! Roadmap 5
  6. 6. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Install SDK $ mkvirtualenv pyconru New python executable in pyconru/bin/python Installing setuptools, pip...done. 6 (pyconru)$ pip install pyrax ipython Downloading/unpacking pyrax Downloading pyrax-1.8.1-py2.py3-none-any.whl (316kB): 316kB downloaded Downloading/unpacking ipython Downloading ipython-2.1.0-py2-none-any.whl (2.8MB): 2.8MB downloaded … Successfully installed pyrax ipython … Cleaning up… See: https://github.com/rackspace/pyrax
  7. 7. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Authentication 7 # authenticate pyrax.set_setting('identity_type', 'rackspace') pyrax.set_credentials(os.getenv('OS_AUTH_USER'), os.getenv('OS_AUTH_APIKEY'), region=os.getenv('OS_AUTH_REGION')) print “authenticated: %s” % pyrax.identity.authenticated
  8. 8. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Authentication Check credentials: (pyconru)$ python -m pyconru.basic (DEBUG) OS_AUTH_USER: foo (DEBUG) OS_AUTH_APIKEY: **** (WARNING) OS_AUTH_REGION undefined, using default 'LON' (DEBUG) authenticated: True (INFO) identity token: cfe6d60f070947bf**************** 8 Define environment variables: (pyconru)$ export OS_AUTH_USER=foo (pyconru)$ export OS_AUTH_KEY=bar (pyconru)$ export OS_AUTH_REGION=LON
  9. 9. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Cloud components 9
  10. 10. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Cloud components 10
  11. 11. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! Roadmap 11
  12. 12. simone.soldateschi@rackspace.co.uk Vertical scaling 2 GB 2 CORES 8 GB 8 CORES
  13. 13. RACKSPACE® HOSTING | WWW.RACKSPACE.COM What is Autoscaling?
  14. 14. RACKSPACE® HOSTING | WWW.RACKSPACE.COM What is Autoscaling? WASTED $$$
  15. 15. RACKSPACE® HOSTING | WWW.RACKSPACE.COM What is Autoscaling?
  16. 16. simone.soldateschi@rackspace.co.uk New Usage Models CLOUDSMART 16 Dedicated Servers are Pets • Great thought to their acquisition • Name them and know each one • Willing to pay big money for their care Cloud Servers are Livestock • Use them as long as they provide value • Acquire more of them when needed • Dispose of any that aren’t needed • Get rid of them if they become ill
  17. 17. RACKSPACE® HOSTING | WWW.RACKSPACE.COM http://www.flickr.com/photos/fischerfotos/7419253200/
  18. 18. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Traffic Patterns
  19. 19. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Traffic Patterns ON & OFF Analytics Banks/Tax Agencies Test environments
  20. 20. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Traffic Patterns FAST GROWTH Events Business Growth Slashdot Effect
  21. 21. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Traffic Patterns VARIABLE News & Media Event Registrations Rapid fire sales
  22. 22. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Traffic Patterns CONSISTENT HR Application Accounting/Finance E-mail
  23. 23. RACKSPACE® HOSTING | WWW.RACKSPACE.COM http://www.flickr.com/photos/maximalideal/3356408693/
  24. 24. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Autoscaling Methodologies Time Based Reactive Predictive
  25. 25. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Time Based Autoscaling Load Balancer Server Server
  26. 26. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Time Based Autoscaling Load Balancer Server Server 9:00am
  27. 27. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Time Based Autoscaling Load Balancer Server Server Nov 1st
  28. 28. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Time Based Autoscaling Load Balancer Server Server Server
  29. 29. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Time Based Autoscaling GOOD FOR On & Off Consistent
  30. 30. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling Load Balancer Server 60% Server 60%
  31. 31. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling Load Balancer Server 80% Server 80%
  32. 32. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling Load Balancer Server 60% Server 60% Server 40%
  33. 33. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling Load Balancer Server 30% Server 30% Server 30%
  34. 34. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling Load Balancer Server 45% Server 45%
  35. 35. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Reactive Autoscaling GOOD FOR Fast Growth Variable
  36. 36. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Predictive Autoscaling Load Balancer Server Server
  37. 37. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Predictive Autoscaling Load Balancer Server Server Forecasted Traffic +30%
  38. 38. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Predictive Autoscaling Load Balancer Server Server Server
  39. 39. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Predictive Autoscaling GOOD FOR Fast Growth Variable
  40. 40. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Schedule-based scaling
  41. 41. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Schedule-based scaling
  42. 42. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Event-based scaling
  43. 43. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Event-based scaling SCALE UP
  44. 44. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Event-based scaling COOL DOWN
  45. 45. RACKSPACE® HOSTING | WWW.RACKSPACE.COM 45 Cooldown
  46. 46. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Event-based scaling COOL DOWN
  47. 47. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto scaling - Event-based scaling SCALE DOWN
  48. 48. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto Scale – Use case FRONT END
  49. 49. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto Scale – Use case LB FRONT END Share nothing Stateless nodes
  50. 50. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto Scale – Use case LB FRONT END LB API BOSS WORKER
  51. 51. RACKSPACE® HOSTING | WWW.RACKSPACE.COM http://www.flickr.com/photos/samuraislice/3309481048/
  52. 52. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! Roadmap 52
  53. 53. RACKSPACE® HOSTING | WWW.RACKSPACE.COM The basics That’s it! $ pip install ansible Installation on management host
  54. 54. RACKSPACE® HOSTING | WWW.RACKSPACE.COM The basics Install agent on managed hosts:
  55. 55. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Why use ansible? Desired state Go live!
  56. 56. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Desired State Write code to tell the computer how to set up itself! 56RACKSPACE® HOSTING | WWW.RACKSPACE.COM
  57. 57. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! Roadmap 57
  58. 58. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Closed-Loop Control Law 58
  59. 59. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Event-based Auto Scale
  60. 60. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Auto Scale
  61. 61. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Closed-loop Control Law – Garçon implementation 61 ?
  62. 62. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! Roadmap 62
  63. 63. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon How to integrate Cloud Monitoring and Auto Scale ? 63
  64. 64. simone.soldateschi@rackspace.co.uk Garçon - How? Garçon
  65. 65. simone.soldateschi@rackspace.co.uk Garçon - Overview Garçon cm2asd cfgmgmtd
  66. 66. simone.soldateschi@rackspace.co.uk Garçon - Go Live! cfgmgmtd Go Live!
  67. 67. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Closed-Loop Control Law – Garçon implementation 67 Garçon
  68. 68. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cm2asd 68 # fetch current list of servers l_current_servers = scaling_group_servers(scaling_group_id) Garçon
  69. 69. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cm2asd 69 for i in range(len(l_current_servers)-1, -1, -1): server_id = l_current_servers[i] s = get_server(server_id) Garçon if s.status != 'ACTIVE': # server not active l_current_servers.pop(i) continue m = get_server_metadata(s.id) try: if (m['aspoc.server_status'] != 'configured'): server not configured l_current_servers.pop(i) continue except KeyError: # server not configured l_current_servers.pop(i) continue
  70. 70. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cm2asd 70 Garçon
  71. 71. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cm2asd 71 # compute average system load for scaling group servers_avg_load = servers_average_load(l_checks, samples, sample_time) # compare current load against configured threshold if servers_avg_load >= threshold_high: # trigger scale_up_webhook r = requests.post(scale_up_webhook) if r.status_code != 202: logger.error('scale_up_webhook (%s) returned HTTP %d' % (scale_up_webhook, r.status_code)) Garçon
  72. 72. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cm2asd 72 if servers_avg_load <= threshold_low: # trigger scale_down_webhook r = requests.post(scale_down_webhook) if r.status_code != 202: logger.error('scale_down_webhook (%s) returned HTTP %d' % (scale_down_webhook, r.status_code)) Garçon
  73. 73. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cfgmgmtd 73 for s_id in l_current_servers: ... # server exists? try: cs = pyrax.cloudservers cs.servers.get(s_id) except: logging.warning('Auto Scale server (%s, %s) missing ' '(maybe deleted manually?)' % ('-', s_id)) continue ... try: # read server metadata m = get_server_metadata(s_id) ...
  74. 74. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cfgmgmtd 74 Use metadata try: if (server_status != 'configured' and server_status != 'configuring'): ... # run thread to configure server threading.Thread(target=configure_server, args=(s_id,)).start() No metadata? except KeyError: # CONFIGURE server (KeyError, no metadata) in thread threading.Thread(target=configure_server, args=(s_id, ansible_timeout,)).start() ? X
  75. 75. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cfgmgmtd, Ansible 75 Reset server’s password: # set server password password = generate_password(10, punctuation=False) set_server_password(server_id, password) Server’s info (e.g. IP address): # fetch server info ip = get_server_ipv4(server_id, MGMT_NETWORK)
  76. 76. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cfgmgmtd, Ansible 76 Leverage Configuration Management System: for playbook in list_ansible_playbooks(): ... cmd = ['ansible-playbook', playbook_filename, # '-vvvv', '-c', 'paramiko', '-i', inventory_file] errcode = run_cmd(cmd, logfilename=playbook_logfilename, timeout=ansible_timeout) playbook_logfilename = (os.path.join(LOG_DIR, '%s-%s' % (s.name, playbook))) playbook_logfile = open(playbook_logfilename, 'w') playbook_filename = (os.path.join((playbooks_base_dir), '%s/main.yml' % playbook))
  77. 77. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Garçon in-depth – cfgmgmtd, Monitoring System 77 Create checks for new server to be managed # cloud monitoring (agent_id := server_uuid) add_cm_cpu_check(server_id)) # good, set 'aspoc_server.status=configured' in metadata set_server_metadata(server_id, 'aspoc.server_status', 'configured')
  78. 78. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Python SDK, Cloud • Auto Scaling • Management System • Control law • Garçon, all together now! RECAP 78
  79. 79. RACKSPACE® HOSTING | WWW.RACKSPACE.COM 79 Q&A @soldasimo simonesoldateschi
  80. 80. RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COMRACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM RACKSPACE® HOSTING | 5 MILLINGTON ROAD | HAYES, UNITED KINGDOM UB3 4AZ UK SALES: +44 (0)20 8712 6507 | UK SUPPORT: 0800 988 0300 | WWW.RACKSPACE.CO.UK @soldasimo simonesoldateschi

Notas do Editor

  • That’s me! I started as a developer, then worked as Systems Engineer.
    Since I moved to UK I strived to combine programming and SysEng skills.

    I like real life too, cycling and whatever outdoor activities.
  • If you’ve seen a Rackspace presentation before, you likely have seen a statement such as this talking about great customer service.

    At Rackspace, we believe that Service is our key strategic differentiator and reason customers will continue to trust their business to Rackspace. While more than ever technology is playing a key role in RAX capabilities (OpenStack, Public/Private Cloud, RackConnect, etc)...we’ll continue to rely on service as our primary differentiator.
  • First of all
    You are Python developers, should you want use OpenStack you’ll need to use Python SDK, namely pyrax
    Beware that the name will change as it reminds Rackspace (i.e.: RAX suffix), but it is supposed to support OpenStack
  • Let’s create a Python Virtual Environment, to keep things clean.
    == CLICK ==
    And install pyrax SDK and ipython for testing purposes
    We will use Python 2
    == CLICK ==
    See PyRAX project, doc and snippets on GitHub
  • Define environment variables
    Use pyrax to authenticate
  • Define environment variables
    == CLICK ==
    Give it a shot -- Use pyrax to authenticate, and see what happens
  • Pyrax supports many Cloud components, that you can choose from to write software.

  • For our purposes we need to use just the following three Cloud components: Monitoring, Servers, Auto Scale

    To fully understand how scaling infrastructures works on the Cloud, let’s discuss what Autoscaling mean.

    This lead us to the next section of this presentation.
  • Discuss what Auto Scale is, how to scale, when and why. Then go deeper into scaling techniques.
  • Traditionally if you wanted a more powerful server, you would buy more RAM and add CPUs.
    That approach is called vertical scaling.
  • Say that 40 servers can serve the highest peak of traffic your infrastructure is going to have.

    You need to provision 40 servers by that date, if you know when it is going to happen

    Or just buy 40 servers from the very beginning
  • Gray area is wasted money.
    You have many more servers just to be prepared for the high-traffic moment.

    High chances are that you are going to have hard times trying to explain someone in finance team «Why you should buy and maintain 40 servers when 10 or 20 are fine most of the time»

    CAPEX and OPEX are extremely high. You own every piece of hardware, and have to maintain it
  • Now, say that you are somehow able to make those two lines be married together.

    This way you provision just a bit more of capacity than what you need, Just-In-Case, but:

    CAPEX is moved toward OPEX – you do not really own anything, you just use servers when you need them
    OPEX is minimised too – should something go wrong: destroy faulty server, spin a new one up

    The ability to automatically or semi-automatically scale up and down a group of servers based on computing or traffic demand by provisioning new services
  • Does any of you name your servers?
  • ON & OFF
    FAST GROWTH
    VARIABLE
    CONSISTENT
  • ON & OFF
    FAST GROWTH
    VARIABLE
    CONSISTENT
  • Boolean load
  • There are three different Autoscaling methodologies to choose from.
    They can also be combined and mixed together.
  • The typical scenario for a web application is…
  • At 9 o’clock…
  • …on the 1st of November…
  • …spinning a Cloud Server up is scheduled!
  • For Reactive Autoscaling let’s say that there are two servers working at 60%
  • Load increases and overall load raises to 80%
  • Autoscaling add a new server to cluster, and overall load decreases
  • Then overall load decreases to 30% (e.g. less requests)
  • Autoscaling spin one server down
  • Let’s discuss the last scaling type: Predictive Autoscaling
  • It is somehow possible to forecast traffic
  • Servers are spun up and down according to forecast
  • Let’s RECAP

    Schedule based scaling: set time to scale up…
  • …then set time to scale down
  • On the other hand, event-based scaling:

    Set thresholds, when hit, scale up or down policies are triggered
  • The idea behind Cooldown is to set the right pace, much like the pace car in a race.

    Let’s say your server requires 3 minutes to be fully provisioned, configured, deployed.
    Within those three minutes there is no reason to scale up again.

    Wait for the server being built to be live, then re-enable scaling up
  • Rings a bell?

    How could we apply scaling to existing infrastructures?
    Let’s view some scenarios.

    Cluster of application or front-end servers
  • Adopting stateless servers
  • Scaling Boss-Worker clusters
  • Enough theory, so far!

    Let’s start discussing what tools you might want to use to use Auto Scale in the Cloud.
  • How many of you use Configuration Management System?
    How many of you use Ansible?
    How many of you use Ansible use, or used, Puppet/Chef/SaltStack?

    Let’s see what Configuration Management System does, and what desired state means.
  • How do you install Ansible on your laptop, or on a management server?
  • In fact Ansible is agent-less.OK, OK, SSH is an agent ;)

    Ansible streamlines managing remote servers, as there is no need of a pre-installed agent. So no golden image (which is not DevOpsy!), or start-up script.
  • How would you provision a server manually?

    Build server up
    Attach block devices
    Create filesystem
    Install packages
    Configure it (e.g. users, daemons, firewall policies, etc)

    Now think that you can achieve the very same result with Ansible.
    You just need to decide what you are aiming to.
  • How a close control loop looks like
  • This diagram shows components of a closed-loop control law.

    It is called closed-loop because there is a feedback, which is taken into account
  • Systems load is monitored, and if it hits certain thresholds, then Auto Scale policies are triggered.

    In OpenStack universe Otter replace Auto Scale, and a monitoring system of customer’s choice replace Cloud Monitoring
  • Auto Scale put messages on Cloud Servers message queue, and servers are spun up or down accordingly
  • Ideally you would like a piece of software which is able to do the following:
    read configuration
    get list of current servers in scaling group
    Fetch data and stats from monitoring system
    computes average load of all systems
    triggers scaling policies

    Infrastructure scales according to the reference, being the configuration file.
  • Put all together with Garçon
  • Garçon is a software that I wrote in Python to integrate Cloud Monitoring and Auto Scale
  • Garçon is the glue between Cloud Monitoring and Auto Scale
    It queries Cloud Monitoring, checks load of scaling groups against configured thresholds, and triggers Auto Scale policies.
  • Garçon is composed of two daemons (can also be run by cron), cm2asd and cfgmgmtd.
    The former fetches stats from Cloud Monitoring. The latter triggers Auto Scale policies, and run configuration management system on fresh new servers.
  • Just to recap: run cfgmgmtd, which runs Ansible, configures new/pristine Cloud Server → makes them ready to Go Live!
  • Ideally you would like a piece of software which were able to do the following:
    read configuration
    get list of current servers in scaling group
    Fetch data and stats monitoring system
    computes average of all systems
    triggers scaling policies

    Infrastructure scales according to the reference, being the configuration file.
  • Fetch current list of servers in scaling group.

    Only ACTIVE server within the scaling group participate in computing overall load (e.g.: average CPU load, average memory usage, length of message queue)
  • AIM -- Fetch monitoring data for ACTIVE servers only
    == CLICK ==
    Get server status
    == CLICK ==
    Check if key/value pair exists in metadata, and if value is configured.
  • Choose Monitoring PROVIDERS
    == CLICK ==
    Ready: Rackspace Cloud Monitoring
    == CLICK ==
    WIP: New Relic, Message Queue
    Next: Nagios
  • Scale up

    HTTP return code is 202 to prevent information leakage
  • Scale down

    HTTP return code is 202 to prevent information leakage
  • Scale down

    What’s servers’ status? Let’s cycle to find out…
    == CLICK==
    Just ensure that every single server exists. REMEMBER we are in the Cloud

  • If metadata tag does not indicate an already managed server
    OR there is no metadata
    THEN run Ansible against that server.
  • Now we are going to discuss how to configure server

    Reset server’s password, to let Ansible SSH into it.

    Do you remember we said «servers are livestock»?
    They are supposed to be managed programmatically. Nobody should never, ever, SSH into a Cloud Server, especially if it is part of a scaling group
  • Lets Configuration Management System work
    == CLICK ==
    Set playbook related variables
    == CLICK ==
    Prepare command statement with options, and run it
  • Create and attach new check to new servers, so that Monitoring System will be aware of new server to monitor.
    == CLICK ==
    State you are finished managing the server, aka TAGserver as CONFIGURED.
    Configuration Management Systems are idempotent, meaning you can run the same playbook/recipe/manifest against an already configured server over and over again.
    For performance purposes, just skip it.
  • Let’s RECAP what we discussed

×