SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Topic: Virtual machine placement with
optimised cost
Summer Intern Report
Submitted By:
Shantanu Bharadwaj
Dept. of Comp. Science & Engg.
IIT Guwahati
Under the guidance of:
Dr. T. Venkatesh
Dept. of Comp. Science & Engg.
IIT Guwahati
Abstract
Almost all modern online services run on geo-distributed data centers, and
fault tolerance is one of the primary requirements that decides the
revenue of the service provider. Growing number of internet services like,
web services, business transactions and cloud computing services are
being deployed over geo-distributed data centers. Geo-distribution is
important for latency, availability, and increasingly also for efficiency. Due
to rapid growth in the volume of demand served, large numbers of geo-
distributed data centers today can benefit from the same multi-megawatt
economies of scale that were initially limited to a few centralized ones. As
a result, modern cloud infrastructures are already highly geo-
distributed. Recent experiences have shown that the failure of a data
center (at a site) is inevitable. In order to mask the failure, spare compute
capacity needs to be provisioned across the distributed data center, which
leads to additional cost. While the existing literature addresses the
capacity provisioning problem only to minimize the number of servers,
this report describes that the operating cost needs to be considered as
well. Since the operating cost and client demand vary both across space
and time, we consider cost-aware capacity provisioning to account for
their impact on the operating cost of data centers. We propose an
optimization framework to minimize the Total Cost of Ownership (TCO) of
the cloud provider while designing fault-tolerant geo-distributed data
centers.
The second part of this report deals with the problem of VM placement.
When a virtual machine is deployed on a host, the process of selecting the
most suitable host for the virtual machine is known as virtual machine
placement, or simply placement. During placement, hosts are rated based
on the virtual machine’s hardware and resource requirements and the
anticipated usage of resources. The administrator selects a host for the
virtual machine based on the host ratings. The operating cost of the VM
placement has two important parameters: electricity cost and
communication cost. In Cloud Environment, the process of execution
requires proper Resource Management and Scheduling due to the high
process to the resource ratio. Resource Scheduling is a complicated task in
cloud computing environment because there are many alternative
computers with varying capacities. The goal of this project is to propose a
model for job-oriented resource scheduling algorithm in a cloud computing
environment. This report proposes a cost-aware heuristic approach for
optimal VM placement among a given number of physical machines in a
data center using resource scheduling techniques. The idea can be
extended to a group of data centers. The results show that the operating
cost has great potential of improvement via optimal VM placement.
Introduction
A data center is a facility to house computer systems and its associated
components like, telecommunications and storage systems. It is a
centralized repository, either physical or virtual, for the storage,
management and dissimination of data and information. A basic level data
center components are server, network and storage hardware. Other
components are, power, cooling, fire suppression, security systems and
network connectivity.
Geo-distributed data center is collection of small, geographically
distributed, fully automated data centers. Geo-distributed data centers
are popular because of the following reasons: first, reduced latency to the
clients as their requests are served by closer data centers. Second, they
are more effective in protecting data from catastrophes. Geo-distributed
data centers are gaining popularity because one data center is too small,
in addition to the above mentioned advantages over a single data center.
This is a general model of a Geo-distributed data center. In a broad way, it
handles two types of processes. They are:
Clients- Who wish to execute some operations or run some protocols.
Servers- Help implement operations, like storing data.
Business critical applications running in geo-distributed data centers
(henceforth simply referred data centers) demand high availability
because of huge loss of revenue, cost of idle employees and loss of
productivity associated with downtime. In addition outages lead to
reduced customer satisfaction, damaged brand perception and regulatory
problems. Instances of a data center failure at a site have been reported
by many cloud service providers like Amazon, Facebook and Google.
Data center unavailability can be due to reasons that vary across software
bugs, router misconfiguration in the Internet, human errors due to poor
supporting documentation and training, and man-made or natural
disasters. Due to these experiences from the industry, it is evident that
failure of a data center is inevitable. Designing a fault-tolerant geo-
distributed data center usually involves spare capacity provisioning
(allocation of additional servers to mask the failure) across different data
center sites, satisfying a set of constraints based on electricity prices,
infrastructure cost, operating cost, demand at each location, and delay
faced by customers. Henceforth, in this report, failure of a single data
center is the only kind of failure we consider.
Cloud computing is developing based on various recent advancements in
virtualization, Grid computing, Web computing, Utility computing, and
related technologies. Cloud computing provides both platforms and
applications on demand through the Internet or Intranet. Cloud
computing is a kind of Internet-based computing that provides shared
processing resources and data to computers and other devices on
demand. It is a model for enabling ubiquitous, on-demand access to a
shared pool of configurable computing resources (e.g., networks, servers,
storage, applications and services), which can be rapidly provisioned and
released with minimal management effort.
Resource scheduling plays an important role in Cloud data centers. One of
the challenging scheduling problems in Cloud data centers is the
consideration of the allocation of VMs. A data center is composed of a set
of hosts (PMs), which are responsible for managing VMs. A host is a
component that represents a physical computing node in a Cloud. It is
assigned a preconfigured processing capability (e.g., that expressed in
Million Instructions Per Second or GHz), memory, storage, and a
scheduling policy for allocating VMs. A number of hosts can also be
interconnected to form a cluster or a data center. In this chapter, we
introduce a framework for cost-efficient resource scheduling of real-time
VMs, considering only the computing resources.
Cost-aware Capacity Provisioning
Spare capacity provisioning across geo-distributed data center to mask
failure of a single data center, can be illustrated by a simple example.
Consider a distributed data center with 5 sites with a compute capacity of
20 units at each site. To mask the failure of any one data center at a time,
we require a spare capacity of 20/4 = 5 units at each of the remaining
data centers. Therefore the total spare capacity required is 5*5 = 25; So
the additional cost in building a fault-tolerant data center that can mask
single failure is 25%. The naive approach uniformly distributes the spare
capacity. However, all data centers do not have the same number of
servers and different locations are characterized by variation in the
electricity cost, bandwidth cost, carbon tax and varying user demand over
time. Therefore, the main challenge in designing fault-tolerant distributed
data center is to provision spare capacity so that along with capital cost
(cost of spare servers), operating cost is minimized while satisfying the
client latency even during the period of failure. Current literature proposes
an optimization framework with the objective of simply minimizing the
number of servers to meet the delay and availability constraints. But
operating cost across different geographical locations also needs to
minimised or optimised.
Considering the cost of a server to be $2000, and its lifetime to be 4 years
, we calculate the energy to acquisition cost (EAC) defined to be the ratio
of cost of running a server for its lifetime to its acquisition cost.
Power cost = 4 years * (8760 hours/year) * (electricity cost) * server
power * PUE
EAC = (power cost / server cost) * 100
PUE or Power Usage Effectiveness is the ratio of total amount of energy
used by a computer data center facility to the energy delivered to
computing equipment. It is a measure of how efficiently a computer data
center uses energy; specifically, how much energy is used by the
computing equipment (in contrast to cooling and other overhead).
PUE = Total Facility Energy / IT Equipment Energy
Higher EAC indicates more power and cooling cost than the server
acquisition cost. Therefore, lower the EAC, more feasible is the system.
This report formulates a mixed integer linear program (MILP) framework
for cost-aware capacity provisioning in fault tolerant geo-distributed data
centers to mask single data center failures. Along with cost of additional
servers, we also consider the variation in electricity prices across space
and time in determining the optimal capacity that
minimizes the operating cost.
Optimization Model
Assumptions:
 Mechanism for failure detection and request re-routing is already
present.
 Failure of only single data center (a site) is considered at a time.
Notations used:
Delay: Let Dmax
be the maximum latency allowed for a client based on the
service level agreements with the cloud provider. Let Dsu be the
propagation delay between user location u and data center location s. The
data center must be designed such that even after the failure of a site, the
latency continues to be lower than Dmax
.
Cost: Let S and U denote the set of data centers and client locations,
respectively. The cost of server (acquisition cost) is denoted by α. Let σs
denote the cost of access bandwidth.
Server Provisioning: Let ms denote the number of servers required in a
data center at s. We define Mmin and Mmax to be the minimum and
maximum number of servers that can be provisioned at any data center.
Power Consumption: Let Pidle be the average power drawn in idle
condition and Ppeak be the power consumed when server is running at peak
utilization. Then total power consumed at
a data center location s belonging to S, at hour h belonging to H, is:
Es is the PUE of data center s,
Average server utilization,
The TCO, which includes server acquisition cost and operating cost, is
defined as :
Subject to the following constraints:
The objective function is the sum of total cost incurred by all the individual
data centers over a day. The goal is to minimize the objective function,
that is, the total cost of ownership (TCO).
/*more stuff about code to be inserted*/
VM placement in distributed data centers:
In order to efficiently allocate computing resources; scheduling becomes a
very complicated task in a cloud computing environment where many
alternative computers with varying capacities are available. Efficient task
scheduling mechanism can meet users’ requirements and improve the
resource utilization. The cloud service providers often receive lots of
computing requests with different requirements and preferences from
users simultaneously. Some tasks need to be fulfilled at a lower cost and
less computing resources, while some tasks require higher computing
ability and take more bandwidth and computing resources.
In this report, only computing resources are considered. A data center is
composed of a set of hosts (Physical Machines), which are responsible for
managing VMs during their life cycles. A host is a component that
represents a physical computing node in a Cloud. It is assigned a
preconfigured processing capability (e.g., that expressed in Million
Instructions Per Second or GHz), memory, storage, and a scheduling policy
for allocating VMs. A number of hosts can also be interconnected to form a
cluster or a data center.
Data centers (probably distributed in different geographical multiple
systems) are the places that accommodate computing equipment and are
responsible for providing energy and air conditioning maintenance for the
computing devices. A data center could be a single construction or it could
be located within several buildings. Dynamic distribution manages virtual
and shared resources in the new application environment—Cloud
computing data centers face new challenges. Efficient scheduling
strategies and algorithms must be designed to adapt to different business
requirements and to satisfy different business goals.
Key technologies of resource scheduling include:
 Scheduling strategies: It is the top level of resource scheduling
management, which needs to be defined by data center owners and
managers. It mainly determines the resource scheduling goals and
makes sure to satisfy all.
 Optimization goals: Scheduling center needs to identify different
objective functions to determine the pros and cons of different types
of scheduling. Now there are optimal objective functions, such as
minimum costs, maximum profits, and maximum resource
utilization.
 Scheduling algorithms: Good scheduling algorithms need to produce
optimal results according to objective functions.
GreenCloud architecture:
Proposed GreenCloud architecture
This figure describes a layered architecture for GreenCloud. There is a web
portal at the top layer for the user to select resources and send requests:
basically, it’s an uniform view of the few types of VMs that are
preconfigured for users to choose. Once user requests are initiated, they
go to the next level—CloudSched—which is responsible for choosing
appropriate data centers and PMs based on user requests. This layer can
manage a large number of Cloud data centers, consisting of thousands of
PMs. At this layer, different scheduling algorithms can be applied in
different data centers based on customer characteristics. At the lowest
layer, there are Cloud resources that include PMs and VMs, both consisting
of a certain amount of CPU, memory, storage, and bandwidth. At the
Cloud resource layer, virtual management is mainly responsible for
keeping track of all VMs in the system, including their status, required
capacities, hosts, arrival times, and departure times.
This report proposes a queuing model where a client requests virtual
machines for a predefined duration. Network Resources are not
considered at all. Jobs are assumed not to communicate with each other
or transmit or receive data. No preference is required as to where the VMs
are to be scheduled. An algorithm is proposed to optimally distribute VMs
in order to minimize the distance between user VMs in a data center grid.
The only network
constraint used is the Euclidean distance between data centers. No
specific connection requests or user differentiation is used. An algorithm is
proposed to schedule VMs within one data center to minimize
communication cost. No network topology is used. Rather, only the
monetary cost of transmitting data is considered for VM requests.
Real-time VM request model:
The Cloud computing environment is a suitable solution for real-time VM
service because it leverages virtualization. When users request execution
of their real-time VMs in a Cloud data center, appropriate VMs are
allocated.
A real-time VM request can be represented in an interval vector :
VMRequestID(VM typeID, start time, finish time, requested capacity).
For example, vm1(1, 0, 6, 0.25) shows that for VM request ID vm1, the VM
requested is of Type1 (corresponding to integer 1) with a start time of 0
and a finish time of 6 and 25% of the total capacity of Type1 PM.
Request formats can vary according to the definitions by data center
owners and managers.
In this report, the request format is as follows:
VMRequestID(VM typeID, start time, finish time, requested CPU capacity,
requested storage capacity).
For example, vm1(1, 0, 6, 2, 1) shows that for VM request ID vm1, the VM
requested is of Type1 (corresponding to integer 1) with a start time of 0
and a finish time of 6 and the request needs 2 units of CPU and 1 unit of
memory.
Assumptions in the proposed model:
 All tasks are independent. There are no precedence constraints
other than those implied by the start and finish times.
 Each PM is always available (i.e., each machine is continuously
available in [0, ∞)).
 Each PM has an operating cost and communication cost associated
with it.
 Each VM request has an electricity cost and communication
overhead associated with it.
 Each PM is linked with every other PM in the system.
 Each communication link is unidirectional.
 The capacities of VMs and PMs are strongly divisible. If (P,V) denote
the list of capacities of PMs and VMs respectively, they are strongly
divisible if every item in list P exactly divides every item in list V.
That is, capacity demanded by VM requests are multiples of
capacities of PMs.
Proposed Algorithm:
The heuristic developed is based on first fit decreasing algorithm along
with some cost optimisation techniques. The VM requests are sorted
according to the decreasing order of their processing times. Each physical
machine has different operating costs at different hours. Each
communication link has a communication overhead associated with it.
Each VM request has an electricity and communication cost. The
algorithm compares the requested capacity with the capacity assigned to
physical machines, finds the physical machine with lowest cost and
assigns it to the VM request. If the capacity is not met, the communication
costs for sending the request to other physical machines are considered
and the physical machine with minimum cost is found out and assigned
again to the VM request.
The pseudo-code for the algorithm is as follows:
Input:
Number of VM requests
VM requests (indicated by their VM ID, start time, finish time, CPU
capacity, storage capacity)
Number of PMs
PM (PM ID, CPU capacity, storage capacity)
Number of hours the whole system will run
Operating cost of each PM at every hour
Communication cost for each link
Output:
PM ID of physical machine assigned to a particular VM request along with
the cost incurred.
Pseudo Code:
n <- number of VM requests
m <- number of PMs
h <- number of hours
eij <- electicity cost of physical machine i at hour j
dij <- communication overhead of link between PM i and PM j
Rei <- operating cost of VM request i
Rbi <- communication cost of VM request i
ei <- average electricity cost of PM i
bi <- average communication cost of PM i
vij <- cost of allocating PM j to request i
v_min <- minimum cost
machinei <- physical machine selected for VM request i
for i = 1 to n do
processing timei = finish timei – start timei
sort (processing timei)
for i = 1 to m do
for j = 1 to h do
ei = find_average (eij)
for i = 1 to m do
for j = 1 to m do
if (i=j)
dij = 0
else
bi = find_average (dij)
for i = 1 to n do
for j = 1 to m do
vij = ( ej * Rei ) / ( ej + bj )
for i = 1 to n do
for j=1 to m do
v_min = find_minimum(vij)
machinei = j
if(capacity_request <= capacity_machine)
jth
machine is allocated to request i
else
capacity_remaining = capacity_request – capacity_machine
for i = 1 to n do
for j = 1 to m do
vij = (( ej * Rei ) / ( ej + bj )) + (( bj * Rbi ) / ( ej + bj )) + (( ej+1 *
Rei ) / (ej+1 + bj+1))
This process is repeated till the requested capacity of the VM request is
not met.
Results:
A small example has been taken to show the working of the algorithm. In
this example, the data center has 3 PMs with given capacities i.e, 2 units of
CPU and 1 unit of storage. 3 VM requests are considered with varying start
times, end times, and capacities. The goal is to allocate PMs to them such
that, the cost is minimised. The costs considered are average operating cost
of PMs, average communication cost of PMs, electric cost of VM requests,
communication overhead of VM requests. The output of the algorithm
implemented, is as follows:
In this algorithm, if there are three nodes and three VMs are to be
scheduled, each node would be allocated one VM, provided all the nodes
have enough available resources to run the VMs. The main advantage of
this algorithm is that it utilizes all the resources in a balanced order.
Comparison with Traditional method:
The simplest approach for this problem is based on the idea that, no
sorting of the VM request IDs are done. It is the traditional approach for
the problem, based on Round Robin Scheduling. The requests are served
in a FCFS manner and the PM with lowest average operating cost is
assigned to the requests. If the capacity is not met, the PM with next
lowest average operating cost is assigned and the process goes on. No
communication overhead is introduced.
Taking the same values from the example taken above, this is the order
and cost of PMs being assigned to VM requests:
Request
ID
Start
time
End time Capacity(C
PU)
Capacity(Stor
age)
Re(electric
ity cost)
1 0 2 4 2 100
2 0 6 2 1 50
3 0 4 2 1 50
Avg operating cost=6 Avg operating cost=5
Avg operating cost=7
Naturally, PM 2 will be the first choice of every request as its average
operating cost is the least.
Cost of running VM1 on PM2 is 500
Capacity not met
PM1 has the next lowest average operating cost
Cost of running VM1 on PM1 is 600
Cost of running VM2 on PM2 is 250
Cost of running VM3 on PM2 is 250
Request 1 Request 2 Request 3
0
200
400
600
800
1000
1200
Our Heuristic
Round Robin
Advantages of Resource Scheduling Algorithm:
 Easy access to resources and Better resource utilization.
 In this report, implementation of the optimized algorithm is
compared with the traditional task scheduling algorithm. The main
goal of this optimized algorithm is to optimize the cost as compare
to the traditional ones.
 This algorithm improves the traditional cost-based scheduling
algorithm for making appropriate mapping of tasks to resources.
 This algorithm computes the priority of tasks on the basis of
different attributes of tasks and after that applies sorting on tasks
onto a service which can further complete the tasks.
Conclusion:
Thus, this report argued the need for cost-aware capacity provisioning for
geo-distributed data centers that can tolerate failure of a single data
center. We proposed an MILP optimization model that reduces the total
cost of ownership (TCO) (includes capital and operating cost factors) while
provisioning the servers across different locations with varying running
cost factors.
This report also stated that Scheduling is one of the most important tasks
in cloud computing environment. Priority is an important issue of job
scheduling in cloud environments. The heuristic developed using resource
scheduling techniques is thus helpful in minimising the cost incurred
during VM placement.

Mais conteúdo relacionado

Mais procurados

A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
Susheel Thakur
 
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Susheel Thakur
 
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Susheel Thakur
 
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Susheel Thakur
 
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEWSERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
Susheel Thakur
 
presentation on reducing Cost in Cloud Computing
 presentation on reducing Cost in Cloud Computing presentation on reducing Cost in Cloud Computing
presentation on reducing Cost in Cloud Computing
Muhammad Faheem ul Hassan
 
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Ericsson
 
Paper id 25201464
Paper id 25201464Paper id 25201464
Paper id 25201464
IJRAT
 

Mais procurados (20)

Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...
 
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
A Study on Energy Efficient Server Consolidation Heuristics for Virtualized C...
 
N1803048386
N1803048386N1803048386
N1803048386
 
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
 
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
 
dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEWSERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
SERVER COSOLIDATION ALGORITHMS FOR CLOUD COMPUTING: A REVIEW
 
Hybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in CloudHybrid Based Resource Provisioning in Cloud
Hybrid Based Resource Provisioning in Cloud
 
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
 
presentation on reducing Cost in Cloud Computing
 presentation on reducing Cost in Cloud Computing presentation on reducing Cost in Cloud Computing
presentation on reducing Cost in Cloud Computing
 
A Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud ComputingA Survey on Virtualization Data Centers For Green Cloud Computing
A Survey on Virtualization Data Centers For Green Cloud Computing
 
Survey on Dynamic Resource Allocation Strategy in Cloud Computing Environment
Survey on Dynamic Resource Allocation Strategy in Cloud Computing EnvironmentSurvey on Dynamic Resource Allocation Strategy in Cloud Computing Environment
Survey on Dynamic Resource Allocation Strategy in Cloud Computing Environment
 
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
Conference Paper: Simulating High Availability Scenarios in Cloud Data Center...
 
A Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingA Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud Computing
 
Paper id 25201464
Paper id 25201464Paper id 25201464
Paper id 25201464
 
Green Cloud Computing :Emerging Technology
Green Cloud Computing :Emerging TechnologyGreen Cloud Computing :Emerging Technology
Green Cloud Computing :Emerging Technology
 

Destaque

A Toy Virtual Machine Project
A Toy Virtual Machine ProjectA Toy Virtual Machine Project
A Toy Virtual Machine Project
Dacong (Tony) Yan
 
Virtual Server Implementation
Virtual Server ImplementationVirtual Server Implementation
Virtual Server Implementation
webhostingguy
 
Server Virtualization With VMware_Project Doc [Latest Updated]
Server Virtualization With VMware_Project Doc [Latest Updated]Server Virtualization With VMware_Project Doc [Latest Updated]
Server Virtualization With VMware_Project Doc [Latest Updated]
Smit Bhilare
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)
Gaurav Bhardwaj
 
Hrishikesh_iitg_internship_report
Hrishikesh_iitg_internship_reportHrishikesh_iitg_internship_report
Hrishikesh_iitg_internship_report
Hrishikesh Malakar
 
Report on cloud computing by prashant gupta
Report on cloud computing by prashant guptaReport on cloud computing by prashant gupta
Report on cloud computing by prashant gupta
Prashant Gupta
 

Destaque (20)

Datacenter migration using vmware
Datacenter migration using vmwareDatacenter migration using vmware
Datacenter migration using vmware
 
Virtual Machine Maanager
Virtual Machine MaanagerVirtual Machine Maanager
Virtual Machine Maanager
 
Cloud Deployment Report
Cloud Deployment ReportCloud Deployment Report
Cloud Deployment Report
 
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
Intelligent storage management solution using VMware vSphere 5.0 Storage DRS:...
 
Explorer Virtual Card Project Instructions
Explorer Virtual Card  Project InstructionsExplorer Virtual Card  Project Instructions
Explorer Virtual Card Project Instructions
 
A Toy Virtual Machine Project
A Toy Virtual Machine ProjectA Toy Virtual Machine Project
A Toy Virtual Machine Project
 
Virtual Server Implementation
Virtual Server ImplementationVirtual Server Implementation
Virtual Server Implementation
 
Server Virtualization With VMware_Project Doc [Latest Updated]
Server Virtualization With VMware_Project Doc [Latest Updated]Server Virtualization With VMware_Project Doc [Latest Updated]
Server Virtualization With VMware_Project Doc [Latest Updated]
 
Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)Large scale virtual Machine log collector (Project-Report)
Large scale virtual Machine log collector (Project-Report)
 
Cloud computing Report
Cloud computing ReportCloud computing Report
Cloud computing Report
 
Hrishikesh_iitg_internship_report
Hrishikesh_iitg_internship_reportHrishikesh_iitg_internship_report
Hrishikesh_iitg_internship_report
 
project report on DATACENTER
project report on DATACENTERproject report on DATACENTER
project report on DATACENTER
 
Report File On Virtual Private Network(VPN)
Report File On Virtual Private Network(VPN)Report File On Virtual Private Network(VPN)
Report File On Virtual Private Network(VPN)
 
Data Center Build Project
Data Center Build ProjectData Center Build Project
Data Center Build Project
 
Future of Sex Report
Future of Sex ReportFuture of Sex Report
Future of Sex Report
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
 
Report on cloud computing by prashant gupta
Report on cloud computing by prashant guptaReport on cloud computing by prashant gupta
Report on cloud computing by prashant gupta
 
Cloud Computing Documentation Report
Cloud Computing Documentation ReportCloud Computing Documentation Report
Cloud Computing Documentation Report
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 
Cloud computing project report
Cloud computing project reportCloud computing project report
Cloud computing project report
 

Semelhante a Summer Intern Report

GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
ijcsit
 
Performance Enhancement of Cloud Computing using Clustering
Performance Enhancement of Cloud Computing using ClusteringPerformance Enhancement of Cloud Computing using Clustering
Performance Enhancement of Cloud Computing using Clustering
Editor IJMTER
 
A viewof cloud computing
A viewof cloud computingA viewof cloud computing
A viewof cloud computing
purplesea
 

Semelhante a Summer Intern Report (20)

Survey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data CentersSurvey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data Centers
 
Cost aware cooperative resource provisioning
Cost aware cooperative resource provisioningCost aware cooperative resource provisioning
Cost aware cooperative resource provisioning
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
 
Performance Enhancement of Cloud Computing using Clustering
Performance Enhancement of Cloud Computing using ClusteringPerformance Enhancement of Cloud Computing using Clustering
Performance Enhancement of Cloud Computing using Clustering
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
 
ENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTINGENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTING
 
A hybrid algorithm to reduce energy consumption management in cloud data centers
A hybrid algorithm to reduce energy consumption management in cloud data centersA hybrid algorithm to reduce energy consumption management in cloud data centers
A hybrid algorithm to reduce energy consumption management in cloud data centers
 
E42053035
E42053035E42053035
E42053035
 
Affinity based virtual machine migration (AVM) approach for effective placeme...
Affinity based virtual machine migration (AVM) approach for effective placeme...Affinity based virtual machine migration (AVM) approach for effective placeme...
Affinity based virtual machine migration (AVM) approach for effective placeme...
 
AViewofCloudComputing.ppt
AViewofCloudComputing.pptAViewofCloudComputing.ppt
AViewofCloudComputing.ppt
 
AViewofCloudComputing.ppt
AViewofCloudComputing.pptAViewofCloudComputing.ppt
AViewofCloudComputing.ppt
 
A View of Cloud Computing.ppt
A View of Cloud Computing.pptA View of Cloud Computing.ppt
A View of Cloud Computing.ppt
 
A viewof cloud computing
A viewof cloud computingA viewof cloud computing
A viewof cloud computing
 
Optimizing the placement of cloud data center in virtualized environment
Optimizing the placement of cloud data center in virtualized  environmentOptimizing the placement of cloud data center in virtualized  environment
Optimizing the placement of cloud data center in virtualized environment
 
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUDG-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
 
Energy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingEnergy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computing
 
A SURVEY ON DYNAMIC ENERGY MANAGEMENT AT VIRTUALIZATION LEVEL IN CLOUD DATA C...
A SURVEY ON DYNAMIC ENERGY MANAGEMENT AT VIRTUALIZATION LEVEL IN CLOUD DATA C...A SURVEY ON DYNAMIC ENERGY MANAGEMENT AT VIRTUALIZATION LEVEL IN CLOUD DATA C...
A SURVEY ON DYNAMIC ENERGY MANAGEMENT AT VIRTUALIZATION LEVEL IN CLOUD DATA C...
 
A survey on dynamic energy management at virtualization level in cloud data c...
A survey on dynamic energy management at virtualization level in cloud data c...A survey on dynamic energy management at virtualization level in cloud data c...
A survey on dynamic energy management at virtualization level in cloud data c...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 

Summer Intern Report

  • 1. Topic: Virtual machine placement with optimised cost Summer Intern Report Submitted By: Shantanu Bharadwaj Dept. of Comp. Science & Engg. IIT Guwahati Under the guidance of: Dr. T. Venkatesh Dept. of Comp. Science & Engg. IIT Guwahati
  • 2. Abstract Almost all modern online services run on geo-distributed data centers, and fault tolerance is one of the primary requirements that decides the revenue of the service provider. Growing number of internet services like, web services, business transactions and cloud computing services are being deployed over geo-distributed data centers. Geo-distribution is important for latency, availability, and increasingly also for efficiency. Due to rapid growth in the volume of demand served, large numbers of geo- distributed data centers today can benefit from the same multi-megawatt economies of scale that were initially limited to a few centralized ones. As a result, modern cloud infrastructures are already highly geo- distributed. Recent experiences have shown that the failure of a data center (at a site) is inevitable. In order to mask the failure, spare compute capacity needs to be provisioned across the distributed data center, which leads to additional cost. While the existing literature addresses the capacity provisioning problem only to minimize the number of servers, this report describes that the operating cost needs to be considered as well. Since the operating cost and client demand vary both across space and time, we consider cost-aware capacity provisioning to account for their impact on the operating cost of data centers. We propose an optimization framework to minimize the Total Cost of Ownership (TCO) of the cloud provider while designing fault-tolerant geo-distributed data centers. The second part of this report deals with the problem of VM placement. When a virtual machine is deployed on a host, the process of selecting the most suitable host for the virtual machine is known as virtual machine placement, or simply placement. During placement, hosts are rated based on the virtual machine’s hardware and resource requirements and the anticipated usage of resources. The administrator selects a host for the virtual machine based on the host ratings. The operating cost of the VM placement has two important parameters: electricity cost and communication cost. In Cloud Environment, the process of execution requires proper Resource Management and Scheduling due to the high process to the resource ratio. Resource Scheduling is a complicated task in cloud computing environment because there are many alternative computers with varying capacities. The goal of this project is to propose a model for job-oriented resource scheduling algorithm in a cloud computing environment. This report proposes a cost-aware heuristic approach for optimal VM placement among a given number of physical machines in a data center using resource scheduling techniques. The idea can be extended to a group of data centers. The results show that the operating cost has great potential of improvement via optimal VM placement.
  • 3. Introduction A data center is a facility to house computer systems and its associated components like, telecommunications and storage systems. It is a centralized repository, either physical or virtual, for the storage, management and dissimination of data and information. A basic level data center components are server, network and storage hardware. Other components are, power, cooling, fire suppression, security systems and network connectivity. Geo-distributed data center is collection of small, geographically distributed, fully automated data centers. Geo-distributed data centers are popular because of the following reasons: first, reduced latency to the clients as their requests are served by closer data centers. Second, they are more effective in protecting data from catastrophes. Geo-distributed data centers are gaining popularity because one data center is too small, in addition to the above mentioned advantages over a single data center. This is a general model of a Geo-distributed data center. In a broad way, it handles two types of processes. They are: Clients- Who wish to execute some operations or run some protocols. Servers- Help implement operations, like storing data. Business critical applications running in geo-distributed data centers (henceforth simply referred data centers) demand high availability because of huge loss of revenue, cost of idle employees and loss of productivity associated with downtime. In addition outages lead to reduced customer satisfaction, damaged brand perception and regulatory
  • 4. problems. Instances of a data center failure at a site have been reported by many cloud service providers like Amazon, Facebook and Google. Data center unavailability can be due to reasons that vary across software bugs, router misconfiguration in the Internet, human errors due to poor supporting documentation and training, and man-made or natural disasters. Due to these experiences from the industry, it is evident that failure of a data center is inevitable. Designing a fault-tolerant geo- distributed data center usually involves spare capacity provisioning (allocation of additional servers to mask the failure) across different data center sites, satisfying a set of constraints based on electricity prices, infrastructure cost, operating cost, demand at each location, and delay faced by customers. Henceforth, in this report, failure of a single data center is the only kind of failure we consider. Cloud computing is developing based on various recent advancements in virtualization, Grid computing, Web computing, Utility computing, and related technologies. Cloud computing provides both platforms and applications on demand through the Internet or Intranet. Cloud computing is a kind of Internet-based computing that provides shared processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort. Resource scheduling plays an important role in Cloud data centers. One of the challenging scheduling problems in Cloud data centers is the consideration of the allocation of VMs. A data center is composed of a set of hosts (PMs), which are responsible for managing VMs. A host is a component that represents a physical computing node in a Cloud. It is assigned a preconfigured processing capability (e.g., that expressed in Million Instructions Per Second or GHz), memory, storage, and a scheduling policy for allocating VMs. A number of hosts can also be interconnected to form a cluster or a data center. In this chapter, we introduce a framework for cost-efficient resource scheduling of real-time VMs, considering only the computing resources.
  • 5. Cost-aware Capacity Provisioning Spare capacity provisioning across geo-distributed data center to mask failure of a single data center, can be illustrated by a simple example. Consider a distributed data center with 5 sites with a compute capacity of 20 units at each site. To mask the failure of any one data center at a time, we require a spare capacity of 20/4 = 5 units at each of the remaining data centers. Therefore the total spare capacity required is 5*5 = 25; So the additional cost in building a fault-tolerant data center that can mask single failure is 25%. The naive approach uniformly distributes the spare capacity. However, all data centers do not have the same number of servers and different locations are characterized by variation in the electricity cost, bandwidth cost, carbon tax and varying user demand over time. Therefore, the main challenge in designing fault-tolerant distributed data center is to provision spare capacity so that along with capital cost (cost of spare servers), operating cost is minimized while satisfying the client latency even during the period of failure. Current literature proposes an optimization framework with the objective of simply minimizing the number of servers to meet the delay and availability constraints. But operating cost across different geographical locations also needs to minimised or optimised. Considering the cost of a server to be $2000, and its lifetime to be 4 years , we calculate the energy to acquisition cost (EAC) defined to be the ratio of cost of running a server for its lifetime to its acquisition cost. Power cost = 4 years * (8760 hours/year) * (electricity cost) * server power * PUE EAC = (power cost / server cost) * 100
  • 6. PUE or Power Usage Effectiveness is the ratio of total amount of energy used by a computer data center facility to the energy delivered to computing equipment. It is a measure of how efficiently a computer data center uses energy; specifically, how much energy is used by the computing equipment (in contrast to cooling and other overhead). PUE = Total Facility Energy / IT Equipment Energy Higher EAC indicates more power and cooling cost than the server acquisition cost. Therefore, lower the EAC, more feasible is the system. This report formulates a mixed integer linear program (MILP) framework for cost-aware capacity provisioning in fault tolerant geo-distributed data centers to mask single data center failures. Along with cost of additional servers, we also consider the variation in electricity prices across space and time in determining the optimal capacity that minimizes the operating cost. Optimization Model Assumptions:  Mechanism for failure detection and request re-routing is already present.  Failure of only single data center (a site) is considered at a time. Notations used:
  • 7. Delay: Let Dmax be the maximum latency allowed for a client based on the service level agreements with the cloud provider. Let Dsu be the propagation delay between user location u and data center location s. The data center must be designed such that even after the failure of a site, the latency continues to be lower than Dmax . Cost: Let S and U denote the set of data centers and client locations, respectively. The cost of server (acquisition cost) is denoted by α. Let σs denote the cost of access bandwidth. Server Provisioning: Let ms denote the number of servers required in a data center at s. We define Mmin and Mmax to be the minimum and maximum number of servers that can be provisioned at any data center.
  • 8. Power Consumption: Let Pidle be the average power drawn in idle condition and Ppeak be the power consumed when server is running at peak utilization. Then total power consumed at a data center location s belonging to S, at hour h belonging to H, is: Es is the PUE of data center s, Average server utilization, The TCO, which includes server acquisition cost and operating cost, is defined as : Subject to the following constraints: The objective function is the sum of total cost incurred by all the individual data centers over a day. The goal is to minimize the objective function, that is, the total cost of ownership (TCO). /*more stuff about code to be inserted*/ VM placement in distributed data centers: In order to efficiently allocate computing resources; scheduling becomes a very complicated task in a cloud computing environment where many alternative computers with varying capacities are available. Efficient task scheduling mechanism can meet users’ requirements and improve the
  • 9. resource utilization. The cloud service providers often receive lots of computing requests with different requirements and preferences from users simultaneously. Some tasks need to be fulfilled at a lower cost and less computing resources, while some tasks require higher computing ability and take more bandwidth and computing resources. In this report, only computing resources are considered. A data center is composed of a set of hosts (Physical Machines), which are responsible for managing VMs during their life cycles. A host is a component that represents a physical computing node in a Cloud. It is assigned a preconfigured processing capability (e.g., that expressed in Million Instructions Per Second or GHz), memory, storage, and a scheduling policy for allocating VMs. A number of hosts can also be interconnected to form a cluster or a data center. Data centers (probably distributed in different geographical multiple systems) are the places that accommodate computing equipment and are responsible for providing energy and air conditioning maintenance for the computing devices. A data center could be a single construction or it could be located within several buildings. Dynamic distribution manages virtual and shared resources in the new application environment—Cloud computing data centers face new challenges. Efficient scheduling strategies and algorithms must be designed to adapt to different business requirements and to satisfy different business goals. Key technologies of resource scheduling include:  Scheduling strategies: It is the top level of resource scheduling management, which needs to be defined by data center owners and managers. It mainly determines the resource scheduling goals and makes sure to satisfy all.  Optimization goals: Scheduling center needs to identify different objective functions to determine the pros and cons of different types of scheduling. Now there are optimal objective functions, such as minimum costs, maximum profits, and maximum resource utilization.  Scheduling algorithms: Good scheduling algorithms need to produce optimal results according to objective functions. GreenCloud architecture:
  • 10. Proposed GreenCloud architecture This figure describes a layered architecture for GreenCloud. There is a web portal at the top layer for the user to select resources and send requests: basically, it’s an uniform view of the few types of VMs that are preconfigured for users to choose. Once user requests are initiated, they go to the next level—CloudSched—which is responsible for choosing appropriate data centers and PMs based on user requests. This layer can manage a large number of Cloud data centers, consisting of thousands of PMs. At this layer, different scheduling algorithms can be applied in different data centers based on customer characteristics. At the lowest layer, there are Cloud resources that include PMs and VMs, both consisting of a certain amount of CPU, memory, storage, and bandwidth. At the Cloud resource layer, virtual management is mainly responsible for keeping track of all VMs in the system, including their status, required capacities, hosts, arrival times, and departure times. This report proposes a queuing model where a client requests virtual machines for a predefined duration. Network Resources are not considered at all. Jobs are assumed not to communicate with each other or transmit or receive data. No preference is required as to where the VMs are to be scheduled. An algorithm is proposed to optimally distribute VMs in order to minimize the distance between user VMs in a data center grid. The only network constraint used is the Euclidean distance between data centers. No specific connection requests or user differentiation is used. An algorithm is
  • 11. proposed to schedule VMs within one data center to minimize communication cost. No network topology is used. Rather, only the monetary cost of transmitting data is considered for VM requests. Real-time VM request model: The Cloud computing environment is a suitable solution for real-time VM service because it leverages virtualization. When users request execution of their real-time VMs in a Cloud data center, appropriate VMs are allocated. A real-time VM request can be represented in an interval vector : VMRequestID(VM typeID, start time, finish time, requested capacity). For example, vm1(1, 0, 6, 0.25) shows that for VM request ID vm1, the VM requested is of Type1 (corresponding to integer 1) with a start time of 0 and a finish time of 6 and 25% of the total capacity of Type1 PM. Request formats can vary according to the definitions by data center owners and managers. In this report, the request format is as follows: VMRequestID(VM typeID, start time, finish time, requested CPU capacity, requested storage capacity). For example, vm1(1, 0, 6, 2, 1) shows that for VM request ID vm1, the VM requested is of Type1 (corresponding to integer 1) with a start time of 0 and a finish time of 6 and the request needs 2 units of CPU and 1 unit of memory. Assumptions in the proposed model:  All tasks are independent. There are no precedence constraints other than those implied by the start and finish times.  Each PM is always available (i.e., each machine is continuously available in [0, ∞)).  Each PM has an operating cost and communication cost associated with it.  Each VM request has an electricity cost and communication overhead associated with it.  Each PM is linked with every other PM in the system.  Each communication link is unidirectional.
  • 12.  The capacities of VMs and PMs are strongly divisible. If (P,V) denote the list of capacities of PMs and VMs respectively, they are strongly divisible if every item in list P exactly divides every item in list V. That is, capacity demanded by VM requests are multiples of capacities of PMs. Proposed Algorithm: The heuristic developed is based on first fit decreasing algorithm along with some cost optimisation techniques. The VM requests are sorted according to the decreasing order of their processing times. Each physical machine has different operating costs at different hours. Each communication link has a communication overhead associated with it. Each VM request has an electricity and communication cost. The algorithm compares the requested capacity with the capacity assigned to physical machines, finds the physical machine with lowest cost and assigns it to the VM request. If the capacity is not met, the communication costs for sending the request to other physical machines are considered and the physical machine with minimum cost is found out and assigned again to the VM request. The pseudo-code for the algorithm is as follows: Input: Number of VM requests VM requests (indicated by their VM ID, start time, finish time, CPU capacity, storage capacity) Number of PMs PM (PM ID, CPU capacity, storage capacity) Number of hours the whole system will run Operating cost of each PM at every hour Communication cost for each link Output:
  • 13. PM ID of physical machine assigned to a particular VM request along with the cost incurred. Pseudo Code: n <- number of VM requests m <- number of PMs h <- number of hours eij <- electicity cost of physical machine i at hour j dij <- communication overhead of link between PM i and PM j Rei <- operating cost of VM request i Rbi <- communication cost of VM request i ei <- average electricity cost of PM i bi <- average communication cost of PM i vij <- cost of allocating PM j to request i v_min <- minimum cost machinei <- physical machine selected for VM request i for i = 1 to n do processing timei = finish timei – start timei sort (processing timei) for i = 1 to m do for j = 1 to h do ei = find_average (eij) for i = 1 to m do for j = 1 to m do if (i=j) dij = 0 else bi = find_average (dij) for i = 1 to n do for j = 1 to m do vij = ( ej * Rei ) / ( ej + bj ) for i = 1 to n do for j=1 to m do v_min = find_minimum(vij) machinei = j if(capacity_request <= capacity_machine)
  • 14. jth machine is allocated to request i else capacity_remaining = capacity_request – capacity_machine for i = 1 to n do for j = 1 to m do vij = (( ej * Rei ) / ( ej + bj )) + (( bj * Rbi ) / ( ej + bj )) + (( ej+1 * Rei ) / (ej+1 + bj+1)) This process is repeated till the requested capacity of the VM request is not met. Results: A small example has been taken to show the working of the algorithm. In this example, the data center has 3 PMs with given capacities i.e, 2 units of CPU and 1 unit of storage. 3 VM requests are considered with varying start times, end times, and capacities. The goal is to allocate PMs to them such that, the cost is minimised. The costs considered are average operating cost of PMs, average communication cost of PMs, electric cost of VM requests, communication overhead of VM requests. The output of the algorithm implemented, is as follows:
  • 15.
  • 16. In this algorithm, if there are three nodes and three VMs are to be scheduled, each node would be allocated one VM, provided all the nodes have enough available resources to run the VMs. The main advantage of this algorithm is that it utilizes all the resources in a balanced order. Comparison with Traditional method: The simplest approach for this problem is based on the idea that, no sorting of the VM request IDs are done. It is the traditional approach for the problem, based on Round Robin Scheduling. The requests are served in a FCFS manner and the PM with lowest average operating cost is assigned to the requests. If the capacity is not met, the PM with next lowest average operating cost is assigned and the process goes on. No communication overhead is introduced. Taking the same values from the example taken above, this is the order and cost of PMs being assigned to VM requests: Request ID Start time End time Capacity(C PU) Capacity(Stor age) Re(electric ity cost) 1 0 2 4 2 100 2 0 6 2 1 50 3 0 4 2 1 50 Avg operating cost=6 Avg operating cost=5 Avg operating cost=7 Naturally, PM 2 will be the first choice of every request as its average operating cost is the least. Cost of running VM1 on PM2 is 500 Capacity not met
  • 17. PM1 has the next lowest average operating cost Cost of running VM1 on PM1 is 600 Cost of running VM2 on PM2 is 250 Cost of running VM3 on PM2 is 250 Request 1 Request 2 Request 3 0 200 400 600 800 1000 1200 Our Heuristic Round Robin Advantages of Resource Scheduling Algorithm:  Easy access to resources and Better resource utilization.  In this report, implementation of the optimized algorithm is compared with the traditional task scheduling algorithm. The main goal of this optimized algorithm is to optimize the cost as compare to the traditional ones.  This algorithm improves the traditional cost-based scheduling algorithm for making appropriate mapping of tasks to resources.  This algorithm computes the priority of tasks on the basis of different attributes of tasks and after that applies sorting on tasks onto a service which can further complete the tasks. Conclusion: Thus, this report argued the need for cost-aware capacity provisioning for geo-distributed data centers that can tolerate failure of a single data
  • 18. center. We proposed an MILP optimization model that reduces the total cost of ownership (TCO) (includes capital and operating cost factors) while provisioning the servers across different locations with varying running cost factors. This report also stated that Scheduling is one of the most important tasks in cloud computing environment. Priority is an important issue of job scheduling in cloud environments. The heuristic developed using resource scheduling techniques is thus helpful in minimising the cost incurred during VM placement.