SlideShare a Scribd company logo
1 of 17
Capacity Management 
and Provisioning 
(Cloud's full, can't build here) 
Matt Van Winkle, Manager Cloud Engineering @mvanwink 
Andy Hill, Systems Engineer @andyhky 
Joel Preas, Systems Engineer @joelintheory
Public Cloud Capacity at Rackspace 
• Rackspace Public Cloud has deployed 100+ cells in ~2 
years 
• New cells used to take engineer assembly and 3-5w 
after bare OS install 
• 1 year later done by on-shift operators ~1w (as low as 
1d) 
• Usually constrained by networking
Control Plane Sizing 
• Data plane operations impacting both cell and top level 
control plane 
– Image downloads/uploads 
• How large should Nova DB be? 
– Breaking point of ‘standard’ cell control plane 
buildout - particularly database
Cell Sizing Considerations 
• Efficient use of Private IP address space 
– Used for connections to services like Swift and 
dedicated environment 
• Broadcast domains 
• Attempt to have minimal control plane for 
overhead/complexity
Hypervisor Sizing Considerations 
• Enough spare drive space for COW images 
– XS VHD size can easily be 2x space given to guest during normal 
operation! 
– Errors in cleaning up “snapshots” exacerbated by tight disk overhead 
constraints 
• Drive space for pre-cached images 
– cache_images=some # nova 
– use_cow_images=True # nova 
– cache_in_nova=True # glance
Other Sizing Notes 
• Need reserve space for emergencies (host evac) 
• Reserve space is cell-bound, due to instances being 
unable to move between cells 
– https://review.openstack.org/#/c/125607/ 
– cells.host_reserve_percent 
• VM overhead 
– https://wiki.openstack.org/wiki/XenServer/Overhead 
– https://review.openstack.org/#/c/60087/
Problems 
• Load Balancers 
• Glance and Swift 
• Fraud / Non Payment 
• Routes 
• Road Testing
Load Balancers 
• Alternate Routes needed for high BW operations 
– Generally Glance 
• Load Balancer can become bottleneck 
• Database queries returning lots of rows (cell sizing)
Swift and Glance Bandwidth 
Problems: 
• Creates single bottleneck 
• Imaging speeds monitored, exceeding thresholds 
triggers investigation / scale out 
• Cache not shared between glance-api nodes
Swift and Glance Bandwidth 
Monitoring / Solutions: 
• Need to get downloads out of path of control plane (compute direct to 
image store) 
• Cache base images 
– Pre-seed when possible 
– Can cache images to HV ahead of time for fast-cloning 
https://wiki.openstack.org/wiki/FastCloningForXenServer 
• Glance and Swift having shared request IDs would be nice 
• Shared cache might elevate hit-rate, save bandwidth 
What about when scaling out doesn’t work? Rearchitecture.
Fraud and Non-Payment 
Fraud 
• Mark instance as 
suspended 
• Still takes capacity 
• What do? 
• Account Actioneer 
Non-Payment 
• Similar to fraud but worse for capacity! 
• Try to give customer as much time as 
possible to return to the fold 
• Same overall strategy as fraud but 
instances kept longer
Road Testing nodes before enabling 
• New Cell 
– Bypass URLs (cell-specific API nodes) 
• Different nova.conf not using cells 
– compute_api_class=nova.compute.api.API # before 
• Cell tenant restrictions 
• Existing Cell/Rekick - Not as easy :( 
– How to ensure customer builds don’t land on box 
that isn’t road tested?
Managing the Capacity Management 
● Supply Chain/Resource Pipeline 
● Impact from Product Development 
● Gaps/Challenges from upstream
Capacity Pipeline 
• Large Customer Requests 
• Triggers 
– % Used 
– # Largest Slots per flavor 
• IPv4 Addresses 
– Cells and scheduler unaware :( 
– Auditor + Resolver 
• Control Plane (runs on OpenStack too)
Product Implications 
• Keep up with code deploys (hotpatches) 
• Adjusting provisioning playbooks to: 
– new flavor types 
– new configurations/applications (quantum- 
>neutron, nova-conductor) 
– control plane changes (10g glance) 
– new hardware manufacturers (OCP) 
• Non production environments
Upstream Challenges 
• Disabled flag for cells 
– Blueprint: http://bit.do/CellDisableBP 
– Bug: http://bit.do/CellDisableBug 
• Build to “disabled” host 
– Testing after a re-provision 
– Testing for adding new capacity to existing cell 
• Scheduling based on IP capacity 
– New scheduler service? 
– Currently handled by outside service “Resolver”, similar to Entropy 
• General “Cells as first class citizen” effort led by alaski
Questions? 
THANK YOU 
RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM 
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

More Related Content

What's hot

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18BIWUG
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesPGConf APAC
 

What's hot (10)

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Planning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18Pascal benois performance_troubleshooting-spsbe18
Pascal benois performance_troubleshooting-spsbe18
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst Practices
 
Performance out
Performance outPerformance out
Performance out
 
Weblogic - clustering failover, and load balancing
Weblogic - clustering failover, and load balancingWeblogic - clustering failover, and load balancing
Weblogic - clustering failover, and load balancing
 

Viewers also liked

Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...CA Technologies
 
Virtualization and how it leads to cloud
Virtualization and how it leads to cloudVirtualization and how it leads to cloud
Virtualization and how it leads to cloudHuzefa Husain
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Clouddannyq
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02Francisco Gonçalves
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureSean Cohen
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldDavid Linthicum
 
Build Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectBuild Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectJeff Douglas
 
Capturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsCapturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsShehzad Lakdawala
 
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsTraditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsJitscale
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingAdrian Cockcroft
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container LandMike Kavis
 
Handling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectHandling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectKen Howard
 
Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Miguel Scotter
 
Adressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesAdressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesMario Cardinal
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Dmitry Lazarenko
 
Non functional requirements framework
Non functional requirements frameworkNon functional requirements framework
Non functional requirements frameworkwweinmeyer79
 
Cross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureCross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureDerrick Bowen
 
Design for non functional requirements
Design for non functional requirementsDesign for non functional requirements
Design for non functional requirementsHabeeb Mahaboob
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?OSSCube
 

Viewers also liked (20)

Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
Case Study: HCL Technologies On Capacity Planning for Cloud and Virtualized E...
 
Virtualization and how it leads to cloud
Virtualization and how it leads to cloudVirtualization and how it leads to cloud
Virtualization and how it leads to cloud
 
Capacity model
Capacity modelCapacity model
Capacity model
 
Capacity Managementand the Cloud
Capacity Managementand the CloudCapacity Managementand the Cloud
Capacity Managementand the Cloud
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing World
 
Build Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku ConnectBuild Consumer-Facing Apps with Heroku Connect
Build Consumer-Facing Apps with Heroku Connect
 
Capturing Measurable Non Functional Requirements
Capturing Measurable Non Functional RequirementsCapturing Measurable Non Functional Requirements
Capturing Measurable Non Functional Requirements
 
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity ModelsTraditional Infrastructure Capacity Models vs. Cloud Capacity Models
Traditional Infrastructure Capacity Models vs. Cloud Capacity Models
 
Capacity Planning for Cloud Computing
Capacity Planning for Cloud ComputingCapacity Planning for Cloud Computing
Capacity Planning for Cloud Computing
 
Who's Who in Container Land
Who's Who in Container LandWho's Who in Container Land
Who's Who in Container Land
 
Handling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile ProjectHandling Non Functional Requirements on an Agile Project
Handling Non Functional Requirements on an Agile Project
 
Case study: integrating azure with google app engine
Case study: integrating azure with google app engine Case study: integrating azure with google app engine
Case study: integrating azure with google app engine
 
Adressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practicesAdressing nonfunctional requirements with agile practices
Adressing nonfunctional requirements with agile practices
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
 
Non functional requirements framework
Non functional requirements frameworkNon functional requirements framework
Non functional requirements framework
 
Cross Platform Mobile Application Architecture
Cross Platform Mobile Application ArchitectureCross Platform Mobile Application Architecture
Cross Platform Mobile Application Architecture
 
Design for non functional requirements
Design for non functional requirementsDesign for non functional requirements
Design for non functional requirements
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?
 

Similar to Capacity Management/Provisioning (Cloud's full, Can't build here)

Managing Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous FleetManaging Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous Fleetandyhky
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation
 
Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Yankai Liu
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Ilya Ganelin
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentOpenStack Foundation
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...NetApp
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5UniFabric
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaSahdev Zala
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 

Similar to Capacity Management/Provisioning (Cloud's full, Can't build here) (20)

Managing Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous FleetManaging Open vSwitch Across a Large Heterogenous Fleet
Managing Open vSwitch Across a Large Heterogenous Fleet
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1Openstack upgrade without_down_time_20141103r1
Openstack upgrade without_down_time_20141103r1
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
TechTarget Event - Storage Architectures for the Modern Data Center - Howard ...
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangya
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Capacity Management/Provisioning (Cloud's full, Can't build here)

  • 1. Capacity Management and Provisioning (Cloud's full, can't build here) Matt Van Winkle, Manager Cloud Engineering @mvanwink Andy Hill, Systems Engineer @andyhky Joel Preas, Systems Engineer @joelintheory
  • 2. Public Cloud Capacity at Rackspace • Rackspace Public Cloud has deployed 100+ cells in ~2 years • New cells used to take engineer assembly and 3-5w after bare OS install • 1 year later done by on-shift operators ~1w (as low as 1d) • Usually constrained by networking
  • 3. Control Plane Sizing • Data plane operations impacting both cell and top level control plane – Image downloads/uploads • How large should Nova DB be? – Breaking point of ‘standard’ cell control plane buildout - particularly database
  • 4. Cell Sizing Considerations • Efficient use of Private IP address space – Used for connections to services like Swift and dedicated environment • Broadcast domains • Attempt to have minimal control plane for overhead/complexity
  • 5. Hypervisor Sizing Considerations • Enough spare drive space for COW images – XS VHD size can easily be 2x space given to guest during normal operation! – Errors in cleaning up “snapshots” exacerbated by tight disk overhead constraints • Drive space for pre-cached images – cache_images=some # nova – use_cow_images=True # nova – cache_in_nova=True # glance
  • 6. Other Sizing Notes • Need reserve space for emergencies (host evac) • Reserve space is cell-bound, due to instances being unable to move between cells – https://review.openstack.org/#/c/125607/ – cells.host_reserve_percent • VM overhead – https://wiki.openstack.org/wiki/XenServer/Overhead – https://review.openstack.org/#/c/60087/
  • 7. Problems • Load Balancers • Glance and Swift • Fraud / Non Payment • Routes • Road Testing
  • 8. Load Balancers • Alternate Routes needed for high BW operations – Generally Glance • Load Balancer can become bottleneck • Database queries returning lots of rows (cell sizing)
  • 9. Swift and Glance Bandwidth Problems: • Creates single bottleneck • Imaging speeds monitored, exceeding thresholds triggers investigation / scale out • Cache not shared between glance-api nodes
  • 10. Swift and Glance Bandwidth Monitoring / Solutions: • Need to get downloads out of path of control plane (compute direct to image store) • Cache base images – Pre-seed when possible – Can cache images to HV ahead of time for fast-cloning https://wiki.openstack.org/wiki/FastCloningForXenServer • Glance and Swift having shared request IDs would be nice • Shared cache might elevate hit-rate, save bandwidth What about when scaling out doesn’t work? Rearchitecture.
  • 11. Fraud and Non-Payment Fraud • Mark instance as suspended • Still takes capacity • What do? • Account Actioneer Non-Payment • Similar to fraud but worse for capacity! • Try to give customer as much time as possible to return to the fold • Same overall strategy as fraud but instances kept longer
  • 12. Road Testing nodes before enabling • New Cell – Bypass URLs (cell-specific API nodes) • Different nova.conf not using cells – compute_api_class=nova.compute.api.API # before • Cell tenant restrictions • Existing Cell/Rekick - Not as easy :( – How to ensure customer builds don’t land on box that isn’t road tested?
  • 13. Managing the Capacity Management ● Supply Chain/Resource Pipeline ● Impact from Product Development ● Gaps/Challenges from upstream
  • 14. Capacity Pipeline • Large Customer Requests • Triggers – % Used – # Largest Slots per flavor • IPv4 Addresses – Cells and scheduler unaware :( – Auditor + Resolver • Control Plane (runs on OpenStack too)
  • 15. Product Implications • Keep up with code deploys (hotpatches) • Adjusting provisioning playbooks to: – new flavor types – new configurations/applications (quantum- >neutron, nova-conductor) – control plane changes (10g glance) – new hardware manufacturers (OCP) • Non production environments
  • 16. Upstream Challenges • Disabled flag for cells – Blueprint: http://bit.do/CellDisableBP – Bug: http://bit.do/CellDisableBug • Build to “disabled” host – Testing after a re-provision – Testing for adding new capacity to existing cell • Scheduling based on IP capacity – New scheduler service? – Currently handled by outside service “Resolver”, similar to Entropy • General “Cells as first class citizen” effort led by alaski
  • 17. Questions? THANK YOU RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM