Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like in 3 Years Time?
1. The Future of Software-Defined Storage –
What Does It Look Like in 3 Years Time?
Richard McDougall, VMware, Inc
CTO6453
#CTO6453
2. • This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or
sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not
been determined.
Disclaimer
CONFIDENTIAL 2
3. This Talk: The Future of SDS
CONFIDENTIAL 3
Hardware
Futures & Trends
New Workloads
Emerging Cloud Native Platforms
4. Storage Workload Map
4
1 Million
IOPS
One
Thousand
IOPS
10’s of
Terabytes
10’s of
PetabytesNew
WorkloadsFocus
of Most
SDS
Today
CONFIDENTIAL
5. What Are Cloud-Native Applications?
Developer access via APIs
Microservices, not
monolithic stacks
Application
Continuous integration
and deployment
Availability architected in the
app, not hypervisor
Built for scale
Decoupled from
infrastructure
CONFIDENTIAL 5
6. What do Linux Containers Need from Storage?
• Ability to copy & clone root images
• Isolated namespaces between containers
• QoS Controls between containers
6
Parent Linux
Container 1
/
/etc /containers
/containers/cont1/rootfs
etc var
/containers/cont2/roofts
etc var
Container 2
Option A: copy whole root tree CONFIDENTIAL 6
7. Containers and Fast Clone Using Shared Read-Only Images
• Actions
– Quick update of child to use same boot image
• Downsides:
– Container root fs not writable
7
/
/etc /containers
/containers/cont1/rootfs
etc var
CONFIDENTIAL 7
9. Docker’s Direction: Leverage native copy-on-write filesystem
Goal:
– Snap single image into writable clones
– Boot clones as isolated linux instances
9
Parent Linux
Golden
/
/etc /containers
/containers/golden/rootfs
etc var
/containers/cont1/roofts
etc var
Container 1
# zfs snapshot cpool/golden@base
# zfs clone cpool/root@base cpool/cont1
CONFIDENTIAL 9
10. /
/etc /containers
/containers/cont1/rootfs
etc shared var
Shared /containers/cont2/roofts
etc shared var
Shared Data: Containers can Share Filesystem Within Host
Containers Today:
Share within the host
Containers Future:
Share across hosts
Clustered File Systems
10
container1 container2
CONFIDENTIAL 10
11. Docker Storage Abstractions with Linux Containers
CONFIDENTIAL 11
Docker
Container
/
/etc /var
/data
Non-persistent
boot environment
Local
Disks
“VOLUME /var/lib/mysql”
Persistent
Data
12. Container Storage Use Cases
CONFIDENTIAL 12
Unshared Volumes Shared Volumes Persist to External Storage
Use Case: Running container-
based SQL or noSQL DB
Use Case: Sharing a set of tools
or content across app instances
Use Case: Object store for
retention/archival, DBaaS for
config/transactions
Host
C C
Storage Platform
Host
C C
Host
C C
Host
C C
Storage Platform
Host
C C
Host
C C
Cloud
Storage
API API
13. Hadoop
batch analysis
Eliminate the Silos: Converged Big Data Platform
Distributed Resource Manager or Virtualization platform
Host Host Host Host Host Host
HBase
real-time queries
NoSQL
Cassandra,
Mongo, etc
Big SQL
Impala,
Pivotal HawQCompute
layer
Distributed Storage Platform
Host
Other
Spark,
Shark,
Solr,
Platfora,
Etc,…
File System (HDFS, MAPR, GPFS) Block StoragePOSIX
CONFIDENTIAL 13
16. Storage Media Technology: Capacities & Latencies – 2016
16
Magnetic Storage
NVMe SSD
DRAM
~500ns
~10ms
~1s
NVM
~100ns
Capacity Oriented SSD
Object Storage
~1ms
~10us
Trip to Australia in 12 hours
Trip to Australia in 1 hour
Trip to Australia in 36 seconds
Trip to Australia in 1.8 seconds
Trip to Australia in .3 seconds
Trip to Australia in 50 days
1TB
1TB
4TB
16TB
32TB
CONFIDENTIAL
4TB
4TB
48TB
192 TB
384TB
Per
Host
17. 17
Storage Media Technology
1 Million
IOPS
One
Thousand
IOPS
10’s of
Gigabytes
10’s of
Terabytes
3D NAND
“Capacity Flash”
2D NAND
“IOPS Flash”
CONFIDENTIAL
18. NVDIMM’s
18
Magnetic Storage
NVMe SSD
DRAM
~500ns
~10ms
~1s
NVM
~100ns
Capacity Oriented SSD
Object Storage
~1ms
~10us
1TB
1TB
4TB
16TB
32TB
CONFIDENTIAL
Std DRAM Event Xfer NANDDDR3 / DDR4
NVDIMM-N[2]
(Type 1)
NANDRAM
Buffer
Async Block XferDDR3 / DDR4
NVDIMM-F[1]
(Type 3)
NVMSync XferDDR4
Type 4[1]
DRAM + NAND: Non-volatile Memory Load/Store
DRAM + NAND: Non-volatile Memory Block Mode
Next Generation Load/Store Memory
More in INF66324 – Non-Volatile Memory
19. 3D XPoint™ Technology: An Innovative, High-Density Design[1]
19
• Non-Volatile
• 1000x higher write endurance than NAND
• 1000x Faster than NAND
• 10x Denser than DRAM
• Can serve as system memory and storage
• Cross Point Structure: allows individual
memory cell to be addressed
• Cacheline load/store should be possible.
• Stackable to boost density
• Memory cell can be read or written w/o
requiring a transistor, reducing power & cost
1. “Intel and Micron Produce Breakthrough Memory Technology”, Intel
21. What’s Really Behind a Storage Array?
23
OS
Storage
Stack
Data
Services
OS
Storage
Stack
Data
Services
Dual Failover
Processors
Clustered
OS
iSCSI, FC, etc
Shared Disk
SAS or FC
Fail-over
Services
De-dup
Volumes
Mirroring
Caching
Etc,…
Industry std
X86 Servers
SAS or FC
Network
CONFIDENTIAL
22. Different Types of SDS (1):
24
Fail-Over Software
on Commodity Servers
HP
Dell
Intel
Super-Micro
etc,…
Nextenta
Data On-tap Virtual
Appliance
EMC VNX VA
etc,…OS
Storage
Stack
Data
Services
OS
Storage
Stack
Data
Services
Your
Servers
Software-
Defined
Storage
CONFIDENTIAL
23. (1) Better: Software Replication Using Servers + Local Disks
25
OS
Data
Services
OS
Storage Stack
Data
Services
Software
Replication
Clustered
Storage
Stack
Simpler
Server
configuration
Lower cost
Less to go
wrong
Not very
Scalable
CONFIDENTIAL
24. (2) Caching Hot Core/Cold Edge
26
Scale-out Storage Service
Hot Tier
Caching of Reads/Writes
Host Host Host Host Host Host Host
Compute Layer – All Flash
Host
Low Latency Network
iSCSIS3
Scale-out
Storage Tier
CONFIDENTIAL
25. (3) Scale-Out SDS
27
OS
Storage Stack
OS OS OS
Storage
Network
Protocol
iSCSI or
Proprietary
Compute Scalable
More Fault
Tolerance
Rolling
Upgrades
More
Management
(Separate
compute and
storage silos)
Data
Services
Data
Services
Data
Services
Data
Services
CONFIDENTIAL
26. (4) Hyperconverged SDS
28
Hypervisor
Storage Stack
Hypervisor Hypervisor Hypervisor
Easier
Management:
(Top-Down
Provisioning)
Scalable
More Fault
Tolerance
Rolling
Upgrades
Fixed
Compute and
Storage Ratio
Data
Services
App App App App App
VSAN
CONFIDENTIAL
30. Device Interconnects
Server
PCI-e Bus
HCA
SATA/SAS HDD/SSDs
Server
PCI-e Bus
NVMe SSD
Intel P3700 @3Gbytes/sec
SATA
SAS
Server
PCI-e Bus
Electrical
Adapter NVM
Over
PCIe
Flash:
- Intel P3700 @3GB/s
- Samsung XS1715
NVM
- Future devices
SAS 6-12G @ 1.2Gbytes/sec
4 lanes gen3 x 4 = 4Gbytes/sec
HCA
HCA
PCIe Serial Speed Bytes/sec
Gen 1 2gbits per lane 256MB/sec
Gen 2 4gbits per lane 512MB/sec
Gen 3 8gbits per lane 1GB/sec
*There are now PCIe extensions for Storage
CONFIDENTIAL 32
31. Storage Fabrics
33
Server
PCI-e Bus
NIC iSCSI
over
TCP/IP
Server
PCI-e Bus
NIC FC
over
Ethernet
Server
PCI-e Bus
NIC NVMe
over
Ethernet
iSCSI
FCoE
NVMe over Fabrics (the future!)
Pros:
• Popular: Windows, Linux, Hypervisor friendly
• Runs over many networks: routeable. TCP
• Optional RDMA acceleration (iSER)
Cons:
• Limited protocol (SCSI)
• Performance
• Setup, naming, endpoint management
Pros:
• Runs on ethernet
• Can easily gateway between FC and FCoE
Cons:
• Not routable
• Needs special switches (lossless ethernet)
• Not popular
• No RDMA option?
Pros:
• Runs on ethernet
• Same rich protocol as NVMe
• In-band administraton & data
• Leverages RDMA for low latency
Cons:
CONFIDENTIAL
34. PCI-e Rack-Scale Compute+Storage
36
<- Storage-DMA, RDMA over PCIe ->
PCIe Fabric
Storage Box Storage Box
• PCIe Fabric:
• Share disk boxes
across hosts
• Storage-DMA:
• Host-Host RDMA
capability
• Leverages the same
PCI-e Fabric
Compute Compute Compute Compute
CONFIDENTIAL
35. NVMe – The New Kid on the Block
37
NVMe Core
Namespaces
NVMe Objects
PCIe
RDMA
IB Ethernet
FC
?
Read
Write
Create
Delete
SCSI-like: Block R/W Transport
NFS-like: Objects/Namespaces
vVol-like: Object + Metadata + Policy
NVMe over
Fabrics
CONFIDENTIAL
36. Now/Future
38
Interconnect Now Next
Future
(Richard’s Opinion)
SATA 6g - NVMe
SAS 12g 24g NVMe
PCI-e Proprietary 4 x 8g NVMe
SCSI Over FC/Virtual NVMe
FC 8g 32g NVMe over Fabrics
Ethernet 10g 20/40g 100g
CONFIDENTIAL
37. SDS Hardware Futures: Conclusions & Challenges
• Magnetic Storage is phasing out
– Better IOPS per $$$ with Flash since this year
– Better Space per $$$ after 2016
– Devices approaching 1M IOPS and extremely low latencies
• Device Latencies push for new interconnects and networks
– Storage (first cache) locality is important again
– Durability demands low-latency commits across the wire (think RDMA)
• The network is critical
– Flat 40G networks design point for SDS
39CONFIDENTIAL
40. Too Many Data Silos
42
Compute
Cloud
Block
Object
K-VManagement
DB
BigDataManagement
Management
Management
Management
HCLHCL HCL HCL HCL
API API API API API
Different HCLs
Different SW Stacks
Walled Resources
Lots of
Management
Developers
Are Happy
CONFIDENTIAL
41. All-in-one Software-defined Storage
Option 1: Multi-Protocol Stack
43
Compute
Cloud
Block
Object
K-V
DB
BigData
Management
HCL
API API API API API
Developers
Are Happy
One Stack
to
Manage
Shared
Resources
Least
Capable
Services
CONFIDENTIAL
42. Software Defined Storage Platform
Option 2: Common Platform + Ecosystem of Services
44
Block
CEPH
MongoDB
MySQL
Hadoop
Management
HCL
API API API API API One Platform
To
Manage
Shared
Resources
Richest
Services
Compute
Cloud
Developers
Are Happy
CONFIDENTIAL
43. Opportunities for SDS Platform
• Provide for needs of Cloud Native Data Services
– E.g. MongoDB, Cassandra, Hadoop HDFS
• Advanced Primitives in support of Distributed Data Services
– Policies for storage services: replication, raid, error handling
– Fault-domain aware placement/control
– Fine grained policies for performance: e.g. ultra-low latencies for database logs
– Placement control for performance: locality, topology
– Group snapshot, replication semantics
CONFIDENTIAL 45
46. The Future of Software-Defined Storage –
What Does It Look Like in 3 Years Time?
Richard McDougall, VMware, Inc
CTO6453
#CTO6453
Notas do Editor
To move faster, though, businesses end up implementing cultural, design, and engineering changes
This is a vision slide that depicts the ultimate goal of creating a big data shared infrastructure that incorporates Hadoop and non-hadoop workloads. Along with the Hadoop MapReduce framework there are other technologies that use HDFS as an underlying filesystem facilities for processing large sets of data, as well as those that use their own mechanism for storage. Among the technologies are
HBase : provides a distributed non-relational database-style mechanism for access to the data that is stored in HDFS. Users use Java APIs to get at the data in HBase. It is modeled on Google’s original BigTable and provides a fault tolerant way of storing large quantities of sparse data.
Big SQL technologies like Pivotal HAWQ and Impala that also present an SQL interface to the end user for access to the HDFS-based Data Tier
Non-HDFS technologies:
NoSQL (sometimes referred to as Not Only SQL) – data management mechanisms that use various technologies other relational for data, such as JSON documents in the MongoDB case.
Other technologies that come from the research communities. These have not emerged into the mainstream as Hadoop has done.