Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
VSAN – Architettura e Design
1. Virtual SAN Design & Architecture
Simone Di Mambro
Sr. Presales System Engineer
sdimambro@vmware.com
2. Chi sono e perchè sono qua
• Da Luglio 2016 faccio parte del team di
Prevendita
• 4 anni in VMware nei Servizi Professionali
• Ho maturato le seguenti passioni nella vita:
– Smartworld ed affini (…si consideratemi pure un
Nerd!)
– Dieta!
• “Life motivator” per amici
– Runner (per scendere di peso ma anche per mangiare di tanto
in tanto qualche sfizio!)
2
3. 3
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
4. 4
VMware® vSphere ® Storage Policy-Based Mgmt
• App-centric storage automation
• Common mgmt across heterogeneous arrays
VMware® Virtual SAN™
• Hyper-converged architecture
• Data persistence delivered from the hypervisor
The VMware Software-Defined Storage Vision
Transforming Storage the Way Server Virtualization Transformed Compute
vSphere
5. Tiered Hybrid and All-Flash Options
5
All-Flash
90K IOPS per Host
+
sub-millisecond latency
Caching
Writes cached first,
Reads from capacity tier
Capacity Tier
Flash Devices
Reads go directly to capacity tier
SSD PCIe Ultra DIMM
Data
Persistence
Hybrid
40K IOPS per Host
Read and Write Cache
Capacity Tier
SAS / NL-SAS / SATA
SSD PCIe Ultra DIMM
Virtual SAN
6. 6
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
7. Deduplication and Compression for Space Efficiency
• Nearline deduplication and compression per disk group level.
– Will be called “Space Efficiency”
• Space Efficiency enabled on a cluster level
• Deduplicated when de-staging from cache tier to capacity tier
– Fixed block length deduplication (4KB Blocks)
• Compressed after deduplication
7
Beta
esxi-01 esxi-02 esxi-03
vmdk vmdk
vSphere & Virtual SAN
vmdk
All Flash Only
8. RAID-5 Erasure Coding
• With “FTT=1” availability RAID-5
– 3+1 (4 host minimum)
– 1.33x instead of 2x overhead
• 20GB disk normally takes 40GB, now just ~27GB
8
RAID-5
ESXi Host
parity
data
data
data
All Flash Only
ESXi Host
data
parity
data
data
ESXi Host
data
data
parity
data
ESXi Host
data
data
data
parity
9. RAID-6 Erasure Coding
• With “FTT=2” availability RAID-6
– 4+2 (6 host minimum)
– 1.5x instead of 3x overhead
• 20GB disk normally takes 60GB, now just ~30GB
9
All Flash Only
ESXi Host
parity
data
data
RAID-6
ESXi Host
parity
data
data
ESXi Host
data
parity
data
ESXi Host
data
parity
data
ESXi Host
data
data
parity
ESXi Host
data
data
parity
10. Virtual SAN Quality of Service
10
• Complete visibility into IOPS consumed per
VM/Virtual Disk
• 1 Click-to-configure limit
• Eliminate noisy neighbor issues
• Granularly manage performance SLAs:
independent of VM provisioning order
…
vSphere + Virtual
SAN
vSphere & Virtual SAN
11. Enhanced Virtual SAN Management with New Health Service
Built-in performance monitoring
Health and performance APIs and SDK
Storage capacity reporting
And many more health checks…
Performance Monitoring Capacity Monitoring
11
12. 12
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
13. Virtual SAN Disk Groups
• Virtual SAN uses the concept of disk groups to pool together flash devices and magnetic disks
as single management constructs
• Disk groups are composed of at least 1 flash device and 1-7 capacity devices
– Flash devices are use for read cache / write buffer
– Storage capacity can be provided by either magnetic disks or flash based devices
– Disk groups cannot be created without a flash device
– Can have 5 disk groups per host = 35 capacity devices per host
13
disk group disk group disk group disk group disk group
Each host: 5 disk groups max. Each disk group: 1 flash device + 1-7 capacity devices
14. Virtual SAN Disk Groups
• Virtual SAN All-Flash disk group configurations are composed of at least 1
performance flash device and 1 capacity flash device.
– Flash devices are use in a two tier format for caching and capacity.
– Capacity flash devices are used for storage capacity similar to magnetic
disks.
– All-Flash disk groups cannot be created without a capacity flash device.
14
disk group disk group disk group disk group
Each host: 5 disk groups max. Each disk group: 1 Performance + 1 to 7 Capacity
Devices
disk group
capacity capacity capacity capacity capacity
performan
ce
performan
ce
performan
ce
performan
ce
performan
ce
15. Virtual SAN Datastore
• Virtual SAN is an object store solution that is presented to vSphere as
a file system.
• The object store mounts the volumes from all hosts in a cluster and
presents them as a single shared datastore.
– Only members of the cluster can access the Virtual SAN datastore
– Not all hosts need to contribute storage, but its recommended.
15
disk group disk group disk group disk group
Each host: 5 disk groups max. Each disk group: 1 SSD + 1 to 7 HDDs
disk group
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SANDatastore
HDD HDDHDDHDDHDD
16. Virtual SAN Objects
• Virtual SAN manages data in the form of flexible data containers called
objects.
• Virtual machines files are referred to as objects.
– There are five different types of virtual machine objects:
• VM Home
• VM swap
• VMDK
• Snapshots
• Memory (vmem)
16
disk group disk group disk group disk group
Each host: 5 disk groups max. Each disk group: 1
SSD + 1 to 7 HDDs
disk group
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual
SANDatastore
HDD HDD HDD HDD HDD
• Virtual machine objects are split into
multiple components based on
performance and availability
requirements defined in VM Storage
profile.
17. Virtual SAN Components
• Virtual SAN components are chunks of objects distributes across
multiple hosts in a cluster in order to tolerate simultaneous failures and
meet performance requirements.
• Virtual SAN utilizes a Distributed RAID architecture to distribute data
across the cluster.
• Components are distributed with the use of two main techniques:
– Striping (RAID0)
– Mirroring (RAID1)
17
disk group disk group disk
group
disk
group
disk
group
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual SAN
network
Virtual
SANDatastore
replica-1 replica-2
RAID1
HD
D
HD
D
HD
D
HD
D
HD
D
• The number of component replicas
and copies
created is based on the object
policy definition.
22. 22
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
23. 2-Node Remote Office Branch Office Solution – More details
23
Centralized Data Center
CentrallymanagedbyonevCenterServer
ROBO1
HDDSSD HDDSSD
vSphere +
Virtual SAN
HDDSSD HDDSSD
vSphere +
Virtual SAN
HDDSSD HDDSSD
vSphere +
Virtual SAN
witness
witness
witness
vESXi
applianc
e
vESXi
applianc
e
vESXi
applianc
e
ROBO2
ROBO3
vCenter Server
2-Node ROBO Solution Overview
• 2-Node in ROBO share a single L2
domain. It requires multi-cast setup with
<5ms RTT, and 1Gbps <10 VMs or
10Gbps >10 VMs.
• Communication between the ROBO and
witness is unicast only. No need for
multi-cast and Witness is only reachable
via L3. Requires <500ms RTT over
1.5Mbps
• Resource requirements for the witness
assuming 5-10 VMs in each ROBO:
• Memory: 8 GB (Use overcommit to
save costs)
• CPU: 1vCPU
• Storage: 5 GB for capacity & 1GB
for catching tier (Use thin
provisioning to save costs)
New Hardware Options
2-node clusters for ROBO
New Flash HW options:
• Intel NVMe
• Diablo ULLtraDIMM™
Hardware Deployment options
Certified Hardware
Integrated Systems
24. Virtual SAN Stretch Cluster
24
• Active-Active architecture
• Supported on both Hybrid and All-Flash
• Extremely easy to setup through UI
• Automated failover
Fault Domain A
Active
Fault Domain C
Active
vSphere + Virtual SAN
Stretched Cluster
HDDSS
D
HD
D
SS
D
HD
D
SS
D
HDDSS
D
HD
D
SS
D
HD
D
SS
D
Witness
Fault Domain B
HDD SS
D
Overview
<=5 ms latency over >10/20/40 Gbps L2 with multicast
25. Virtual SAN Stretched Clusters – Overview
25
Active – Active Data Centers
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms RTT over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Overview
• Site-level protection with zero data loss and near-instantaneous
recovery. This enables support for active-active data center
• Architecture is based on Fault Domains. Virtual SAN cluster split
across 3 failure domains
• Network latency between the two main sites must be less than 5
milliseconds RTT
• Witness VM resides on a 3rd site ( It could be another Data Center,
vCloud Air, or Colo). It holds only meta-data and does not run any VMs
• Max FTT is 1
• Automated failover in case of site failures
• Communication between the main sites and witness is unicast( FD1
and FD2 share a single L2 domain. FD3 is only reachable via L3 from
FD1 & FD2)
• Each VSAN stretched cluster scales to 31+31+1
• Customer could have many VSAN stretched clusters
Benefits
• Disaster avoidance
• Planned maintenance
26. Virtual SAN Stretched Clusters – Sites Network Requirements
26
Network Requirements
HDDSSD HDDSSD
vSphere + Virtual SAN
Witness
Appliance
< 5 ms RTT over >10/20/40 Gbps
FD3
FD1 FD2
L2 with multicast
Details
• Max network latency of 5 millisecond RTT and enough
bandwidth for the workloads
• Support for sites up to 100km apart as long as network
requirements are met
– <=200ms RTT over >=100Mbps to witness (L3, No multicast)
– <= 5ms RTT over 10/20/40gbps to data fault domains (L2
with multicast)
• Network Bandwidth requirement for the write
operations between the main two sites in Kbps= N
(number of nodes on one site) * W (Amount of 4K IOPS
per Node) * 125Kbps. Minimum of 1 Gbps each way
• For a 5+5+1 config on server medium and ~300 VM the
network requirement is around total of 4 Gbps (2Gbps
each way)
• Layer 2 network communication is required
27. Virtual SAN Stretched Cluster with vSphere Replication and SRM
28
– Live migrations and automated HA restarts between stretched cluster sites
– Replication between Virtual SAN datastores enables RPOs as low as 5 minute
– 5 minutes RPO is exclusively available to Virtual SAN 6.x
– Lower RPO’s are achievable due to Virtual SAN’s efficient vsanSparse snapshot mechanism
– SRM does not support standalone Virtual SAN with one vCenter Server
Any distance >5 min RPOsite a
vSphere + Virtual SAN
Stretched Cluster
< 5 ms RTT over >10/20/40 gbps
Active Active
site b
L2 with Multicast
site x
vSphere + Virtual SAN
VR
DR
vCenter
vCenter
witness
appliance
SRM
SRM
28. 29
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
29. Understanding Failure Events
Virtual SAN recognized two different types of hardware device events in order to define the type
of failed scenario:
– Absent
– Degraded
Absent: VSAN knows/thinks the data is temporarily unavailable
– Good chance data copy will come back soon
Absent events are responsible to trigger the 60 minutes recovery operations.
– Virtual SAN will wait 60 minutes before starting the object and component recovery operations
– 60 minutes is the default setting for all absent events
– Configurable value via hosts advanced settings
30
30. Host Failure – 60 Minute Delay
• Absent – will wait the default time setting of 60 minutes before starting the copy of objects and
components onto other disk, disk groups, or hosts.
• Greater impact on the cluster overall compute and storage capacity.
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
vmdk
new mirror copy
60 minute wait
Host failure, 60 minutes wait copy of impacted component
raid-1
31. Understanding Failure Events
Degraded: VSAN knows/thinks the data copy is permanently lost
– No hope data copy will come back
– No reason to wait, data copy is known to be lost
Degraded events are responsible to trigger the immediate recovery operations.
– Triggers the immediate recovery operation of objects and components
– Not configurable
Any of the following detected I/O errors are always deemed degraded:
– Magnetic disk failures
– Flash based devices failures
– Storage controller failures
Any of the following detected I/O errors are always deemed absent:
– Network failures
– Network Interface Cards (NICs)
– Host failures
Unplugging disks causes absent events, and not degraded – this allows for mistakes (unplugging
incorrect drive) to be rectified with 60 minutes
32
32. Cache/Capacity Device Failure – Instant mirror copy
• Degraded – The process for resynchronizing all impacted components on the failed capacity
device is instantaneously in order to re-created the data onto other disk, disk groups, or hosts.
– Resynchronization can take time depending on the amount of data.
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
vmdk
new mirror copy
Instant!
Disk failure, instant mirror copy of impacted component
raid-1
33. Network Failure – 60 Minute Delay
• Absent – will wait the default time setting of 60 minutes before starting the copy of objects and
components onto other disk, disk groups, or hosts.
• NIC failures, physical network failures can lead to network partitions.
– Multiple hosts could be impacted in the cluster.
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
vmdk
new mirror copy
60 minute wait
Network failure, 60 minutes wait copy of impacted component
raid-1
34. Virtual SAN 1 host isolated – HA restart
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
isolated!
HA restart
raid-1
vSphere HA restarts VM
35. Virtual SAN 2 hosts isolated – HA restart
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
isolated! isolated!
HA restart
raid-1
vSphere HA restarts VM on ESXi-02 / ESXi-03, they own > 50% of components!
36. Virtual SAN partition – With HA restart
vsan network
vmdkvmdk witness
esxi-01 esxi-02 esxi-03 esxi-04
Partition 1 Partition 2
HA restart
vSphere HA restarts VM in Partition 2, it owns > 50% of components!
raid-1
38. 3 Nodes or 4?
• 3 Nodes – Nowhere to re-build components
• 4 Nodes – Offers the ability to rebuild components
39
vmdkvmdk witness
X vmdk
39. Quorum Mechanism
• In version 5.5 the witness was required to meet the necessary >50% of components must be
available for the data to be online
• In 6.0 not all disk objects will have a Witness
• 6.0 Introduced a “Voting” mechanism where each component could have more than one vote
• 6.x >50% of votes are required for data to be available
40
vmdk witness vmdk
33.3333% 33.3333% 33.3333%
40. 41
What’s new in vSAN 6.2
How it Works
Use Cases
In case of failure?
Reference
Overview and principles
41. Three Ways to Get Started with Virtual SAN Today
42
VSAN
Assessment32Download
Evaluation
Online
Hands-on Lab1
• Test-drive Virtual SAN right
from your browser—with
an instant Hands-on Lab
• Register and your free,
self-paced lab is up and
running in minutes
• 60-day Free Virtual SAN
Evaluation
• VMUG members get a 6-
month EVAL or 1-year
EVALExperience for $200
• Reach out to your VMware
Partner, SEs or Rep for a
FREE VSAN Assessment
• Results in just 1 week!
• The VSAN Assessment tool
collects and analyzes data
from your vSphere storage
environment and provides
technical and business
recommendations.
Learn more…
vmware.com/go/virtual-san
• Virtual SAN Product
Overview Video
• Virtual SAN Datasheet
• Virtual SAN Customer
References
• Virtual SAN Assessment
• VMware Storage Blog
• @vmwarevsan
vmware.com/go/try-vsan-en vmware.com/go/try-vsan-en