Server virtualization has forever changed the way we think about compute resources. Traditional storage architecture is a mismatch for today's virtualized environments. Gridstore's unique and patented architecture solves this problem and increases performance while decreasing costs. Learn how.
VK Business Profile - provides IT solutions and Web Development
Gridstore's Software-Defined-Storage Architecture
1. WHITE PAPER
The End of Big Storage
The explosion of virtualization gives rise to a new
software-defined storage architecture
INTRODUCTION Server virtualization has forever changed the way we think about compute resources. The drawback
though, is that it breaks the traditional storage architecture, which was designed almost two decades before server
virtualization changed the infrastructure architecture. The architectural mismatch between the old and the new leads
to poorly performing storage, high storage costs and complex storage management. Enter Software-Defined Storage
(SDS). SDS realigns storage with virtualization, offering high-performance storage, simple management and lower costs.
The success of server virtualization lies in the fact that nothing in the infrastructure knows
that anything has changed to applications, OSs, servers, networks or storage. By inserting
the hypervisor into the infrastructure stack, the relationship between virtual machines and
the underlying storage radically changes. Traditional storage was never designed for the
new architecture of virtualization. Rather than being limited by physical constraints, CPU
and memory can now be flexibly allocated to virtual machines (VMs) in the exact increments
required. Resource allocations can also be very easily changed to accommodate workload
evolution selecting or returning resources from/to a resource pool shared across all VMs
on a given host. This enables significantly more efficient use of existing compute resources,
providing not only much greater operational flexibility but also driving cost savings and the
ease of use of centralized management.
Over half of existing x86 workloads have been virtualized, and most new applications coming
online today run on virtual infrastructures. What makes server virtualization so successful
is its transparent insertion into the infrastructure stack. Resources are virtualized in a way
that does not require any changes in operating systems, applications or physical hardware.
It effectively creates a software container — called a virtual machine (VM)—that enables
complete freedom of movement for resources in that container, from one physical host to
another as an atomic entity.
VMs are basically a collection of compute, storage and network resources. Server virtualiza-
tion introduces a new, many-to-one architecture that governs how compute resources are
allocated. It does nothing, however, to change how physical storage resources are allocated.
Unfortunately, storage resources continue to use a traditional one-to-one architectural
design that results in significant inefficiencies and challenges in virtual computing environ-
ments. To achieve the true promise of the software-defined data center, storage resources
need to undergo the same architectural transformation that compute resources have done
with server virtualization. This white paper will discuss the why and how of the architectural
transformation storage must undergo in the context of the software-defined data center.
2. PAGE 2White Paper | The End of Big Storage
Virtualization Breaks Traditional Storage
The 1980s heralded the rise of a new client/server computing architecture, which was
significantly different from the legacy mainframe architectures that dominated the 1960s
and 1970s. Taking advantage of advances in processor and memory technology, client/server
computing architectures cost-effectively paired single applications with individual servers. The
pairing was done completely with their own dedicated CPU, memory, storage and network
resources. Storage architectures for these environments used a one-to-one architecture
design that assumed storage would be owned and used only by a single server. This was
called direct attached storage. Storage objects, known as logical unit numbers (LUNs)1, were
presented to that single server. Storage management operations—like snapshots, cloning
and replication, also known as “data services”— all operated at the LUN level. Generally,
storage controllers internal to the servers dealt with only a single application, and could be
tuned to perform optimally for that workload. Data-services policies could also be tailored for
the requirements of that single workload. As data growth rates began to increase, external
storage was introduced as a way to provide the flexibility necessary to grow storage capacity
at rates different than compute resources. External storage was packaged as a separate stor-
age array, complete with its own controllers and data services. These arrays were connected
to the server over a storage network usually Fibre Channel in this time frame. Storage LUNs
in these arrays appeared to the server as local disk, and an array’s controllers could still be
tuned for the application of a single server.
Virtualization Changed Nothing (But Changed Everything)
The success of server virtualization lies with the fact that nothing else in the infrastructure
knows that anything has changed. At a bare minimum, infrastructure is hard to change and is
disruptive, because of:
■■ Defined protocols on how layers of the infrastructure stack interact with layers above
and below
■■ Well-established business ecosystems throughout the infrastructure industry, hundreds
of thousands of applications run perfectly fine in data centers
The brilliance of server virtualization was its transparent insertion into the infrastructure
stack. No changes to applications, OSs, servers, networks or storage are required. If any of
these elements had to change, server virtualization would not have been the success it is
today. Even with this transparent insertion, it has taken virtualization a decade to get to where
it is today. While the hypervisor did not force any changes—particularly to the application/
server workload operating in a VM — the relationship between the infrastructure compo-
nents has been changed. While the virtual machine perceives no change, the reality is that
everything around it has changed.
1 A LUN is a logical device addressed by the SCSI protocol. LUNs are logical storage objects that are “carved” (created)
from the actual physical storage, and are what is actually presented as storage to the server.
The success of server
virtualization lies with the
fact that nothing else in the
infrastructure knows that
anything has changed.
3. PAGE 3White Paper | The End of Big Storage
A New Container—The Virtual Machine
Virtualization does two specific things. First, it virtualizes or abstracts the hardware resources
through the hypervisor. Second, it creates a new software-defined container, a VM, which
allocates and isolates resources to this container. Virtualization effectively places all physical
resources on a hardware platform into a pool. This enables them to be easily selected
and allocated to one or more VMs. Tens or hundreds of VMs can potentially be created
from a single physical hardware platform, each of which can be running its own individual
application. This transforms the one-to-one architecture from the client/server era to a new
many-to-one architecture, making it very easy to create, move and delete VMs.
Combined with external storage that is shared across two or more physical hardware
platforms (hosts), virtualization also enables administrators to quickly and easily move VMs
across physical boundaries from one host to another. Each container representing a VM is
just a file that resides on the external storage. The host on which it is running accesses the
file and executes the compute for it. To move the VM to another host, simply shut it down on
one host, access the file from another host and execute its compute there.
Storage arrays typically support a limited number of LUNs. Because of this,
it is inevitable that more than one VM will reside on any given LUN. In fact,
virtualization moves administrators from an environment in which a single
server owns one or more LUNs to an environment where tens or hundreds
of VMs may reside on a single LUN. This creates an architectural mismatch
between the many-to-one architecture that virtualization offers for compute
resources and the one-to-one architecture that traditional storage based on
LUN management offers. This architectural mismatch has significant negative
implications, all of which result in higher costs to make virtualization work
effectively.
IMPLICATION 1: POOR PERFORMANCE—THE I/O BLENDER EFFECT
Most server operating systems were written with the assumption that they
would have direct access to dedicated storage devices. Operating systems
have evolved over the past 20 years to optimize the I/O before writing it to
storage, in order to increase performance. This is a valid assumption with the traditional one-
to-one server-to-storage architecture, and a server’s dedicated storage controllers could be
tuned to optimize performance for the specific I/O pattern that application was generating.
In general, this meant re-ordering the I/O from an individual application to enable it to be
written sequentially to spinning disk as much as possible, providing a RAID configuration that
delivered the desired reliability at minimum cost.
The many-to-one orientation of virtual computing throws a huge wrench into this approach.
The operating systems on each of the VMs no longer interact directly with the storage.
Instead, they interact with the hypervisor, which, in turn, interacts with the storage. As the
largely sequential I/O streams from each VM flow through it, the hypervisor multiplexes them,
producing an extremely random I/O pattern, which it then writes to disk. Spinning disks
handle random I/O up to 10 times slower than sequential I/O.
Optimized
I/O
Traditional Storage
Optimized 1:1
OS
APP
Capacity
Controller
Hypervisor
VM VM VM VM
I/O
Blender
Optimized
I/O
Random I/O
Traditional Storage
Many to One
Capacity
Controller
Figure 1. In virtual environments, administrators
inevitably end up with tens or hundreds of VMs per LUN
4. PAGE 4White Paper | The End of Big Storage
On top of that, I/O patterns in virtual computing environments tend to be much more write-in-
tensive. With the traditional client/server model, an application workload that required 20%
writes (and 80% reads) was considered “write-in-
tensive.” In virtual environments, most virtual
server workloads generate at least 40% to 50%
writes, and virtual-desktop environments, on the
other hand, can generate as much as 90% writes.
Enterprise-class spinning disks handle writes that
are roughly 35%-50% slower than reads.
This extremely random, extremely write-intensive
I/O pattern is referred to as the “I/O blender
effect.” It causes traditional storage architectures
built around the one-to-one orientation to
perform between 40% and 60% slower than
that same storage performed in client/server
environments. As administrators seek to build
storage configurations back up to meet perfor-
mance requirements, storage costs increase
and, in turn, drive up the cost per VM. The result
is unexpectedly high storage costs with storage
configurations that are significantly over-provi-
sioned in terms of capacity.
IMPLICATION 2: MANAGEMENT COMPLEXITY
As mentioned earlier, traditional storage architectures manage all storage operations
at the LUN level. When migrating a virtual server (i.e. VM) from one host to another,
exclusive ownership of the LUN on which it resides is transferred from the “source”
host to the “target” host. The same is true with data services like snapshots, clones and
replication managed at the array level. They can be performed on LUNs but not on
individual VMs.
This works fine when there is only one server per LUN. However, we know that with
virtual computing, it is rare to have just a single VM per LUN. For any storage operation
an administrator wants to perform on a single VM, the same must be performed on
other VMs residing on the same LUN as the target VM. Administrators do not have
the management granularity to perform storage operations on individual VMs the way
they did for individual servers in the traditional client/server model.
This leads to potentially significant management inefficiencies. For example, to snap-
shot VM1 on LUN1, all other VMs on LUN1 will also be snapshot. Snapshots take up
storage capacity, and larger snapshots take up more storage capacity. Administrators
will be storing snapshots of VMs they don’t care about, just to get a snapshot of the
one they do care about. If administrators want to replicate that snapshot to a remote
location for disaster-recovery purposes, then they must replicate all the VMs on the
LUN, taking up not only additional storage capacity at the remote location, but also
consuming additional bandwidth to replicate VMs they don’t even care about. Clearly,
this wastes precious resources.
VM.1 VM.2 VM.3 VM.n
Hypervisor
I/O Blender
Server
Storage Array
VM.1 VM.2 VM.3
LUN
Traditional Storage Architecture
Control Plane
Data Services
Data Plane (I/O)
Capacity
StorageControlPerLUN
Figure 2. Even within a single host, an
extremely random I/O pattern predominates,
and gets much worse as more hosts share
a storage array
VM.1 VM.2 VM.3 VM.n
Server
Hypervisor
VM-Centric View: Manage snapshots,
replication per VM
LUN-Centric View: Manage snapshots,
replication per LUN (all VMs)
One-to-many data services
Clustered file system
Storage Array
VM.1 VM.2 VM.3
VM.1 VM.2 VM.3 VM.nVM.1 VM.2 VM.3
Traditional Storage Architecture
Control Plane
Data Services (Snapshot, Replicate...)
Data Plane (I/O)
LUN-1
LUN
StorageControlPerLUN
Figure 3. With multiple VMs per LUN,
LUN-level management does not provide
the desired management granularity
VM.1 VM.2 VM.3 VM.nVM.1 VM.2 VM.3
5. PAGE 5White Paper | The End of Big Storage
It also forces administrators to adopt a fragmented management model that is harder and
more costly to administer. VMs are managed from a centralized management platform, like
Microsoft System Center, that allows management operations to be performed on individually
selected VMs. But the storage for these VMs is managed from the management GUI associated
with the storage array, which imposes a LUN-level management paradigm. What virtual
administrators want is the same ability to manage all resources associated with individual
servers, including storage that they had with the traditional client/server model. The difference
is that they want it applied to virtual infrastructure. Traditional one-to-one storage architectures
just can’t provide this.
IMPLICATION 3: UNPREDICTABLE SCALABILITY
In the last decade, data growth rates have skyrocketed for most commercial enterprises.
More and more data is being generated and retained by enterprises for longer periods to
meet compliance requirements. On top of that, social media is generating huge amounts of
data. Enterprises plan to add an average of 43TB of physical storage capacity this year.2 The
increasing importance of virtualization has also contributed to this growth in storage capacity
requirements. This is attributed to the I/O blender effect, as well as the ease of creating new
servers in the virtual world. As a result of this data growth, storage platforms targeted for use
in virtual environments have higher scalability standards to meet. Traditional storage archi-
tectures have had difficulty meeting these scalability requirements in a cost-effective manner.
In general, storage-processing power is not increased as capacity as added to traditional
storage—resulting in diminishing performance as capacity is scaled. To add processing
power, you are forced into disruptive and expensive forklift upgrades. As more and more
VMs use the shared storage capacity in these environments, performance can become
unpredictable as the array struggles to handle a high number of disparate workloads across
multiple hosts. Because of the inefficiency with which traditional storage architectures handle
the I/O blender effect, storage requirements increase faster in virtual environments, making
scalability an even greater concern.
IMPLICATION 4:
INCREASED STORAGE COSTS THAT REDUCE SAVINGS FROM VIRTUALIZATION
The I/O blender effect, bloated data services and unpredictable scalability all drive the need
for significantly more storage. The added storage costs often consume more than the cost
savings driven by server consolidation, making virtual computing much less economically
compelling than it first appeared to be. The reasons behind this may not be clear to the
casual observer. The net result of the I/O blender effect is that traditional storage used in
virtual infrastructures produce 40% - 60% fewer IOPS than they do in physical infrastructures.
This poor performance can drive higher costs in several ways. If not addressed from the
storage side, then each host can effectively support far fewer VMs, increasing the cost per
server and reducing the server-consolidation savings. If additional spinning disks are added
to increase the IOPS the storage can handle, then costs increase accordingly. If SSD arrays
are used, costs also increase, due to the expense of the solid-state storage. This high cost
storage is applied to all workloads, many of which may not be able to justify the expense. The
LUN-based management imposed by traditional storage results in “bloated” data services.
These services unnecessarily consume storage resources through an inability to operate
at the desired level of granularity—the individual VM. In addition, the resulting fragmented
2 Storage Magazine, May 2013 Reader Survey, p. 25.
6. PAGE 6White Paper | The End of Big Storage
management model drives additional training requirements and is harder to administer. Both
issues drive up cost. Virtual computing requires much more rapid storage expansion than
physical computing. Both the I/O blender effect and LUN-based management drive relatively
higher storage growth rates. The relative ease of creating new servers, each of which
requires its own storage, also adds to this burden. Traditional storage architectures don’t
support incremental, linear scalability. Instead, they require storage to be purchased in larger
“chunks,” resulting in higher up-front costs and over-buying. Scalability limits drive the need
for forklift upgrades, which add not only additional expense, but which are also disruptive.
Clustered storage designs can address the scalability problem, but are generally designed for
traditional file serving workloads—not the type of low-latency, high IOPS workloads demand-
ed by virtualization.
Software-Defined Storage Provides the Needed
Architectural Transformation
Software-defined storage (SDS) aligns the storage architecture and its management
model to match the architecture and management model of virtualization. The goal of this
alignment is to deliver three key benefits:
1. Higher, more reliable performance for applications running in virtual environments
2. Simplified management of storage, including storage provisioning, data services and scaling
3. Lower overall cost per virtual server driven primarily by storage savings
To achieve this, SDS works on the exact same principles that made server virtualization so
successful:
1. Virtualizing storage resources into a shared resource pool
2. Creating a new container for storage that matches the VM architecture
3. Inserting transparently into the infrastructure stack
SDS Virtualizes Storage Resources into Pools
In the same way that the hypervisor virtualized compute resources, SDS virtualizes storage
resources (capacity, bandwidth, memory, cache, processing) from many physical storage
appliances (storage nodes) into a scalable, fault-tolerant storage pool. By simply adding more
storage nodes into the pool, we eliminate the scaling limitation of traditional storage. Each
node instantly and seamlessly adds capacity, bandwidth and processing to the storage pool,
making these resources easily available to the VMs. Different classes of storage resources
can be added into pools, which can then be presented to VMs as different classes of disk.
Representative classes might include:
■■ Standard performance - hybrid storage combining flash performance with backing
SATA capacity that is cost effective for general-purpose virtualized workloads
■■ Extreme performance - all-flash storage that consistently delivers the highest perfor-
mance for every I/O for select workloads
■■ Capacity - cost-effective storage for workloads that require large amounts of capacity for
working sets
■■ Archive - low-cost storage for data that must be retained and kept immutable for long
periods
Software-defined storage
aligns the storage
and virtualization
architectures to deliver
on the promise of
virtualization.
7. PAGE 7White Paper | The End of Big Storage
SDS Creates a New Storage
Container Aligned to the VM
Just as server virtualization created a new
container with the VM, allowing for much
more efficient allocation and management
of compute resources, another new
container needs to be defined— one that
resolves the current mismatch between
compute and storage architectures in
virtual environments. This new storage
container is a virtual controller. Software-
defined compute defined the VM, while
software-defined storage will define the
virtual controller. Together, they provide
the one-to-one relationship between
server and storage that virtualization
broke. The virtual controller utilizes the
storage resources from virtualized pools
and presents a subset of these resources
to each VM. The virtual controller also
optimizes how these resources are utilized
and presents a unified management
structure for provisioning and data services
on a per-VM basis.
Isolation, Optimization and Prioritization I/O on A Per-VM Basis
The virtual controller operates at the hypervisor layer, isolating the I/O for each VM to
eliminate the I/O blender effect. This isolation effectively creates the container that connects
the VM with storage. By isolating the I/O from each VM, the virtual controller creates virtual
storage stacks for each VM. Once the isolated I/O is in the stack, it is dynamically optimized
to provide optimal storage performance for the VM’s application workload, regardless of
what other VM workloads are accessing the shared storage resource. Every application has
an optimal I/O pattern and configuration. The dynamic optimization detects this I/O pattern
and then optimizes for it. By creating these virtual storage stacks, the I/O stream from
each individual VM can be optimized as it passes from the hypervisor through the network
and down into the storage pools. Where there is resource contention, all resources within
this virtual storage stack can be dynamically allocated and prioritized by policy. The virtual
controller has intelligence that operates at both ends of the stack to affect the policies for
each VM. This end-to-end architecture eliminates the problems associated with traditional
storage architectures that cannot start optimizing I/O streams until they are inside the
storage array. Attempting to manage quality of service (QoS) this late in the game results in
congestion, along with I/O queuing and blocking. Large writes to the array from a low-priority
workload can completely block latency-sensitive, small-block random I/O from a high-priority
workload. When I/O is isolated and channeled from the VM through the network into the
shared storage, the I/O of each VM operates within a sealed container. As such, other VMs
cannot interfere.
Figure 4. Each VM gets its own virtual
controller,allowing its storage to be optimized
for its particular needs
8. PAGE 8White Paper | The End of Big Storage
Management Data Services on a Per-VM Basis
Software-defined storage and its new container consolidate the management model back
into a single, VM-centric model. This allows storage operations to be performed at the
desired level of granularity from the same management platform used to manage the VMs
themselves. With this centralized model in place, administrators will no longer be forced to
license and pay for storage functionality as part of a hardware purchase. Their storage-man-
agement capabilities will no longer be defined by their storage hardware. The storage
resources required to meet application requirements can now be as easily and flexibly
defined as compute resources in virtualized environments.
This virtual-storage container extends the VM management model into the virtualized
storage pool. It re-institutes the one-to-one relationship between servers and storage in the
new context of virtualized environments. Storage managed by each virtual controller can be
defined to meet certain performance, reliability and functional requirements:
■■ I/O streaming sequenced before it is written to storage in order to remove the inefficien-
cies of the I/O blender effect
■■ Defining RAID levels to meet reliability requirements
■■ Defining data-services policies regarding snapshots, clones, failover, replication and other
functions as needed — all on a VM-by-VM basis
Transparent Deployment
Software-defined compute was transparently inserted into the infrastructure stack without
requiring any changes to operating systems, application software, or underlying hardware.
Software-defined storage must follow a similar path. From each VM’s point of view, the virtual
controller presents to the hypervisor what appears to be a standard SCSI device — a local
disk. The hypervisor can mount it and lay out its clustered file systems across this device with-
out any changes. This new storage stack is usable by the VMs without requiring any changes
to operating systems, application software or the hypervisor. For software-defined storage to
be successful, this transformative change to the storage architecture must be transparent.