Leverage Open Source Storage Solutions to Achieve Exascale Storage Architecture

By 2016, server-based storage solu6ons will
lower storage hardware costs by 50% or more…
Storage @ a Tipping Point…
What does this mean for us?
By 2018, 3 of the top 7 general-purpose disk
array vendors will either be acquired or exit
the storage hardware business…

Leverage on Framework Integra6on, Management
SoMware & Automa6on, strong Customer Support
capabili6es and evolving overall storage ecosystem
as a and robust global oﬀering
Opportunity up for grabs…
Assess emerging storage architectures,
technologies & approaches to create a combined
strategy to meet speciﬁc workload requirements
Making the most of it…

New breed of Storage Services…

Designed for…

•  Web-Scale – Scale-Up & Scale-Out
•  Mul6-Tenancy – Mul6-Customer/Container
•  Hyper-Access – Millions of end-consumers
•  Resilience – Ge]ng over Prac6cal Limita6ons
Store Global – Access Local!

OSS Addresses the Need…

OSS SDS Solu@ons

ü  Nutanix
ü  Gluster

Enterprise
Hyper-Scale
Transac@onal
Distributed
Appliance
COTS/HPC

One Workload doesn’t ﬁt all…

Architectures to ﬁt various workloads

Type 1: Clustered Architecture
‘Federated Model’ layered a top
‘scale-up’ architecture makes
them more ‘scale-out’ type from a
management standpoint.
Tends to ‘bounce the IO’ un6l it
gets to the brain (header) that has
the data. ’Federated’ models use
data mobility approach to
rebalance between brains &
persistence pools, leading to low
latency on writes
Brains (HA Header)
Persistent Pool

Type 2: Tightly Coupled, Scale-Out
Uses shared memory (cache and
metadata) between nodes, and the data
itself is distributed across some number
nodes. This architecture deals with large
amount of inter-node communica6on
The deﬁning element of shared
memory models is cri6cal to
these designs. It enables
‘symmetric’ IO paths through all
brains. It is designed so that in
failure (planned or unplanned)
modes, IO opera6ons would
remain rela6vely balanced.
Brains (HA Header)
Persistent Pool
IO Path (Shared Memory)

Type 3: Loosely Coupled, Scale-Out
This model does not using shared
memory between nodes, but the
data itself is distributed across
mul6ple nodes. It deals with a
larger amount of inter-node
communica6on on writes (IO
intensive) as data is distributed.
As it is transac6onal the writes are
distributed & always coherent
The design –
•  Simple in opera6ons and scaling.
•  Very good distributed reads as data
is serviced by mul6ple nodes.
•  Not ‘HA’. The resilience comes from
data copies & distribu6on.
Brains (mul@-node)
Distributed Pool

Type 4: Distributed, Share Nothing
The Design –
•  No Shared Memory
•  Non-transac6on, Lazy data
•  Distributed reads can be achieved
•  No ‘HA’. The resilience of speciﬁed
data can come from distribu6on.
The Architecture –
•  The ‘Most Scalable’ Architecture
•  Super-Simple implementa6on
•  Highly COTS reliable, Low cost
•  Mostly ‘SoMware Only’ design
•  Object & non-POSIX support on
base ﬁlesystem

Workload based Architecture…
On-Premise Hosted/Cloud (Private) Hosted/Cloud (Public)

Gluster Storage has a fully supported integra6on
•  Hadoop Data Plaiorm 2.1
•  Ambari Management Suite

This integra6on can run various Hadoop jobs with
•  accomplished file system plug-in
•  reliable enterprise grade storage back-end
•  standard protocol-based ingest op6ons
•  no single point of failure

Gluster Storage is a verified, high performance back-end for Splunk's cold
storage 6er, used for vast machine data analysis.

Web-scale object storage solu6ons for archival & rich media, are
CloudStack offerings on Ceph Storage
ISV Maturity focused…

Exascale…

Scale-out
Stack Design

Single global namespace
Aggregates disk and memory resources into
a single trusted storage pool.

Security
Support SELinux enforcing mode with SSL-
based in-flight encryp@on

Object access to file storage
Filestore can be accessed using object-API.

Erasure coding
Enhance data protec@on by using
informa@on stored in the system to
reconstruct lost or corrupted data.

Bit-rot detecXon
Help preserve the integrity of data assets by
detec@ng silent corrup@on.
Tiering
Automa@cally move data between fast (SSD-
based) and slow (HDD) @ers based on access
frequency.

ReplicaXon
Supports synchronous replica@on within a
data center and asynchronous replica@on
for disaster recovery.

Snapshots
Assure data protec@on through cluster-wide
filesystem snapshots. User accessible for
easy recovery of files.

ElasXc hashing algorithm
No metadata server layer eliminates
performance boYlenecks and single points
of failure.
Feature Glance…

Industry Standard Client Support

•  NFS, SMB protocols for file-based
access
•  NFSv4 mul@-headed support for
enhanced security & resilience
•  OpenStack Swi] support for Object
access
•  GlusterFS na@ve client for highly
parallelized access
Deep Hadoop IntegraXon

•  HDFS-compa@ble filesystem
•  No single point of failure
•  NFS and FUSE based data inges@on
IntegraXon with RHEV

•  Centralized visibility and unified
management of storage and virtual
infrastructures through RHEV Manager
console.
•  Live migra@on of virtual machines
Feature Glance…
Easy online management

•  Web-based management console
•  Powerful and intui@ve CLI for Linux
admins
•  Monitoring (Nagios-based)
•  Expand/shrink storage capacity without
down@me

Scale-out Write…
•  The client ini6ate an IO and transmits it to
the node it's communica6ng with. For all-
in-one style architectures, this is a VM
node that's co-located with the client on
the same hardware
•  Once the node receives the write
acknowledgement from the other node(s),
it responds back to the client
acknowledging the write.
•  Depending on the array plaiorm, other
things can be done with the write like
inline deduplica6on, compression, etc.
•  Some arrays that implement flash-based
write caching can stage the writes to flash
to clear the RAM for more incoming writes.
•  The write is eventually flushed to disk (SSD
or Magne6c) on each node that received
the write

Scale-out Read…. •  The client ini6ates an IO request and
t r a n s m i t s i t t o t h e n o d e i t s
communica6ng with. For all-in-one style
architectures, this is a VM node that's co-
located with the client on the same
hardware.
•  The node receives that IO, checks its read
cache in RAM for the data and then
(depending on the array) checks SSD
cache for the data.
•  If the data isn't in either loca6on, the
node checks its metadata table to locate
the data on disk (local or another node /
nodes). Data is read directly from the
underlying disks if local or is requested
from containing node across the inter-
node link.
•  The node places a copy of the read in
cache and responds to the client with the
requested data.

Scale-out Resilience….
Distributed Clustered

•  Use of SSD (& Magne6c) across the environment as one shared read cache
•  Speed comparable with an All-Flash Array; All VM IO will be from ﬂash, while
backup will be from SSD-SSD-Magne6c
•  Scaling of capacity and performance achieved by adding more SSDs
•  Limits failure impact of SSD. IO available for rebuild & hot cache for Live-Migra6on

The Bo]om Line…
SoMware Defined Storage (SDS) can achieve ‘Exascale’ propor6ons, which to
date has been difficult to manage cost-effec6vely, even at Enterprise levels.

Wide-spread adop6on of Web-Scale and Distributed Applica6on architecture,
by Enterprises, poses significant opportuni6es for SDS usage to go
mainstream. Enterprises essen6ally would look at Service Providers to
provision this hyper-scale infrastructure, while they focus on a more engaging
Business App & Dev-Ops.

Bear in mind, however, that the a strategically posi6oned SDS Service poriolio
may require substan6al specialist skills and resources in areas such as sizing,
integra6on, tuning, maintenance and support, a packaged Service offering
from the Service Provider is a much an6cipated move.

GlusterFS Current Features & Roadmap:
hYp://gluster.readthedocs.org/en/latest/presenta@onsGlusterFS_Current_Features_and_Roadmap.pdf
AddiXonal Reading…
Gartner Doc ID:G00255093
hYp://www.gartner.com/technology/reprints.do?id=1-23NR9T2&ct=141027&st=sb
Red Hat Gluster Storage
hYp://www.redhat.com/en/ﬁles/resources/en-rhst-gluster-datasheet-INC0210625.pdf
Understanding Storage Architecture
hYp://virtualgeek.typepad.com/virtual_geek/2014/01/understanding-storage-architectures.html
Distributed File System
hYp://cecs.wright.edu/~pmate@/Courses/7370/Lectures/DistFileSys/distributed-fs.html

Discussion & Huddle…
Abhijeet Upponi
aupponi@yahoo.com
+91 9619 455 020

Leverage Open Source Storage Solutions to Achieve Exascale Storage Architecture

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (9)

Destaque

Destaque (20)

Semelhante a Leverage Open Source Storage Solutions to Achieve Exascale Storage Architecture

Semelhante a Leverage Open Source Storage Solutions to Achieve Exascale Storage Architecture (20)

Leverage Open Source Storage Solutions to Achieve Exascale Storage Architecture