AWS Community Day CPH - Three problems of Terraform
Intelligent Datacenter placement
1. EEDC
34330
Execution Intelligent placement of
Environments for datacenters for Internet
Distributed Services
Computing
Master in Computer Architecture,
Networks and Systems - CANS
Homework number: 6
Group number: EEDC-32
Francesc Lordan francesc.lordan@gmail.com
2. Introduction
Popular Internet companies offer services to millions of users everyday.
These services are hosted in geographically distributed datacenters.
No public information about how they select the locations
2
9. Introduction
St. Louis
PUE: 1.32
Land: 0.264 $/SF
Energy: 0.047 kWh
Water: 0.21 cents/gal
CO2: 806 g/kWh
9
10. Framework for placement - Parameters
Cost
Capital Expenses (CAPEX): investments made upfront and
depreciated over the lifetime of the datacenter
– CAP_ind: independent of the number of servers.
• Bringing the electricity and external networking.
– CAP_max: maximum number of servers that can be hosted
• Land adquisition
• Datacenter construction
• Purchasing and installing power delivery infrastructure
• Cooling infrastructure
• Backup infrastructure
– CAP_act: purchasing the servers and internal networking gear
10
11. Framework for placement - Parameters
Cost
Operational Expenses (OPEX): costs incurred during the operation of
the datacenters
– OP_act: maintenance and administration of the equipment and external
networking bandwith.
• Domined by the staff compensation.
– OP_utl: electricity and water costs involved in running the servers
Lower taxes and incentives
11
12. Framework for placement - Parameters
Response Time: Latency between a population center and a location.
– Latency(c, d): latency between a location d and a center c.
– Pcd: Number of servers at a location d that serve request from c
– Servers(c): Number of servers required by the center c
Consistency Delay: time required for state changes to reach all mirrors
– Latency (d1, d2): one-way latency between the locations d1 and d2.
Availability: depends on the network avalability of all the datacenters
CO2 emissions: determined by the type of electricity consumed
– Emissions(d): carbon emissions (g/Kwh) at location d.
12
13. Framework for placement – Formulation
Inputs:
– Maximum number of servers
– Expected average utilization for the servers
– Number of user that each server can accomodate
– Amount of redundancy
– Latencies and availability constraints
– CAPEX and OPEX for each location
– Latencies between any population center and each location
– Latencies between any two locations
13
14. Framework for placement – Formulation
Outputs:
– Optimal cost
– Maximum number of servers at each location
– Number of servers that service a population center at a location
14
15. Framework for placement – Solutions
Simple linear programming (LP0)
– Simplifies the equation to check if a datacenter must be placed at a
location and which centers it provides. Proportionally assigns the
max number of servers and computes the network costs with the
original one
Pre-set linear programming (LP1)
– Presets if a location contains a datacenter and its size and removes
the centers which are provided variable.
Bruteforce (Brute)
– Generates all the possibilities and tests them using the LP1
approach
15
16. Framework for placement – Solutions
Heuristic Based on LP (Heuristics)
– Generates 10 possible datacenter networks for each number of
datacenters using LP0 applies the LP1 algorithm and sorts the
results in increasing order of cost and finally runs the bruteforce
method on a small set of solutions to obtain the most efficient.
Simualted Annealing plus LP1(SA+LP1)
– SA starts with a configuration that fulfills the constraints and
evaluates the neighbors obtained using LP1. The solution is selected
when there is no cost improvement within an iteration interval.
Optimized SA+LP1(OSA+LP1)
– Adjusts the results of the LP1: when no servers are assigned to a
datacenter, it is removed.
16
17. Placement tool
User only specifies:
– Area of interest
– Granularity of the potentials datacenters
– Location of existing datacenters
– Max number of Servers
– Ratio of user per server
– Max latency between
– Max delay
– Min availability
The toolkit obtains the missing data to compute the best
datacenter network in order to fulfill the user constraints.
17
20. Exploring datacenter placement tradeoffs
Latency
– Latencies > 70 ms have the same cost
– Latency = 50 ms is the best tradeoff between latency and cost
– Latencies < 35 doubles the cost of 50 ms
Availability
– Less level Tier datacenters more datacenters
– It’s cheaper to achive an avaiability level with more low-level Tier
datacenters than with less high-level datacenters.
– TierII datacenters are the best option
20
21. Exploring datacenter placement tradeoffs
Consistency delay
– Low consistency delays and low latency are conflicting goals
– Low consistency delays implies less datacenters and lower costs
Green Datacenters
– When latencies can be relatively high, a green datacenter is less
expensive than $100K a month.
Chiller-less datacenters
– Water chillers increases energy consumption by 20% and
building costs by 30%. Necessary for locations with an average
temperature over 20ºC.
– Avoiding chillers is feasable when latencies are over 70 ms. It
reduces costs by an 8%.
21
The area of interest is fit into an n x n grid (n depends on granularity). Those tiles inside the area of interest and where a datacenter can be build form the possible location. It also takes the main population centers within this area. Using geolocation services, we can instantiate parameters like distance between to datacenters or the population of a center. The set of users is assumed to be a fraction of these populations. Other location-dependent data obtained with Internet services can be: The topology of the ISP backbones and obtain the latencies between to points or the closest Network for a datacenter. The power plants, transmission lines and C O2 emissions and get the closestPower and the emissions of a datacenter Electricity, land, water and temperature If some data is missing then it takes them for the neighboring locations. Taking into account all this information, the toolkit can compute or assume the rest of the parameters it requires: the PUE of a datacenter in that location, the cost to connect into the power supply or to an ISP backbone, the building, land and water costs; servers and internal networking purchasing and operational expenses and staff compensations.