2. Agenda
• Datacenter Storage in General
• DAS, NAS and SAN
• Storage Virtualization
• VDI Storage
– Issues with VDI Storage
– Solutions for Performance issues
3. Study by IDC
• In 2011 roughly 1.8 trillion GBs of data
created.
• Total data doubled in last two years.
• The prediction is total data will grow by
50x in next decade.
• Too much of unstructured data.
• Disks are still in stone-age with motor,
spindle and head.
Numbers collected from EMC.com which is available for public
4. Types of Storage
• Classify into three according to the access
mechanism between server and storage.
– Direct Attached Storage (DAS)
– Network Attached Storage (NAS)
– Storage Area Network (SAN)
5. Direct Attached Storage (DAS)
SCSI
• Disk(s) directly connected to the machine.
• Most simple/common storage.
• Example: Our Laptops, Desk tops etc
6. Network Attached Storage
Servers
LAN
LAN
NAS box
• Uses CIFS/NFS protocol to access files.
• Similar to a remote shared folder.
• Client side redirector forward the file requests to
NAS box.
• Example: EMC Celerra, NetApp FAS
7. NAS - Module diagram
Server
Application NAS Box
User Space
Kernel Space
File System Redirector
CIFS Protocol Layer File System
CIFS Protocol Layer
TCP/IP Stack Volume Manager
TCP/IP Stack
NIC Driver Disk Driver
NIC Driver
LAN
LAN
8. What is a SAN
EMC Symmetrix DMX 2000
EMC Symmetrix DMX 1000
9. SAN Continued..
Servers
Fibre Channel Switch
SAN
• Example : EMC Symmetrix, CLARiiON
10. SAN Architecture
Disks
GBs of memory in
each controller
Controllers/Storage Processors (It can be
ACTIVE-ACTIVE or ACTIVE-PASSIVE)
Fibre Channel Switch
11. Difference between SAN and NAS
• In NAS ‘file streams’ are transferred thru
wire.
• In SAN ‘disk blocks’ are read from
storage.
• In SAN Fibre Channel is the common
communication mechanism. (SAN support
SCSI over TCP/IP also, which is called
iSCSI)
• In NAS file streams transfer over TCP/IP
stack.
12. Common features of SAN
• Backup
• Replication
• Snapshot
• Features specific to SAN like SRDF for
Symmetrix
SAN Specific data collected from wikipedia
13. Storage Virtualization - Advantages
• Hide the internal complexity of storage
system.
• Better Disk block usage – Study shows
only 30-40% of disk space is used
effectively.
• Better Performance.
• Scalability.
14. Taxonomy of Storage Virtualization
• Virtualization at Host Operating system
storage stack.
• Switch/Appliance based Virtualization.
• Virtualization at external storage array
(SAN).
15. Storage Stack
Application
User Space
Kernel Space
File System
Volume Manager
Host Operating system Storage stack
Disk Class Driver
Hardware Driver
Fibre channel adapter
Fibre Channel cable
Fibre channel Switch
SAN
16. Virtualization at Operating Storage
Stack
• A typical Windows Storage stack
IO Request flow thru
each layer
File System
Volume Manager
Disk Class Driver
Hardware Driver
17. Virtualization at File System Layer
File1.doc
Virtual Cluster Number (VCN)
File System
Volume Manager
Disk Class Driver Logical Cluster Number (LCN)
Disk blocks
Hardware Driver
• NTFS expose IOCTL FSCTL_GET_RETRIEVAL_POINTERS, so
that any app can query VCN-LCN mapping. Usually the disk
defragment app uses this IOCTL.
18. HSM File System (Tiered Storage)
Only 20-30% of
actively used data
RAM
($50/MB)
SAN ($.5/MB)
TAPE($.05/MB)
• Ex: EMC DiskXtender
19. Virtualization at Volume Manager -
RAID
RAID
• RAID 5 - Also know as Striped Volume with parity.
• Fault tolerance is achieved by reserving an equivalent of
one disk for keeping parity information.
• Rotate the parity stripe to all disks - Avoid the possibility
of parity disk become busy all time.
20. Virtualization at Disk Controller
LBA 1
LBA 2
LBA 3
LBA 4
LBA 5
LBA n
• Disk Controller convert Logical Block Address to Cylinder Head
Sector address.
• Disk controller take care of damaged sectors also.
22. Switch based Virtualization
• Combine LUNs from one or more Arrays to a
single virtual LUN and pass to host OS.
• Take one big LUN from one Array, divide it and
give to different host OS.
• Security – One host can see certain LUNs only.
• Vendors are adding more intelligence in switch
level like advanced volume management,
caching, QoS functions…
• EMC Invista, IBM SAN Volume Controller
26. Problems with VDI Storage
• Boot/Login Storm
• App Storm
• Virus scanning.
• Many PoC fails or end up with more cost
because of storage array cost.
• Some queries in XD/VDI-in-a box forum
– Bootup time of approx ~170 min.
– Slow logon.
– Do I need to put dedicated LUN to each
server.
27. Windows partition alignment
issue
• Data stored in disk as blocks. Block size
vary, usually 64K (multiple of OS page
size).
• Windows XP/2003 write signature starting
of partition and actual partition start at 63
sectors to make it aligned with disk
cylinder boundaries.
• This can result in extra IOs.
• The partition should aligned with SSD
cache/storage block.
28. Windows 2003 partition
Partition starting at less SSD friendly location
Signature NTFS Volume 1
Block 1 Block 2 Block 3 Block 4 Block 5 Block 6
29. IO Blender problem
• Sequential and Random IO.
• OS try to make the IO sequential –
Windows Cache Mgr and SCSI/Storport
driver, Linux Buffer cache and IO
Scheduler.
• Hypervisor screw-up this optimization.
• OS think data from block storage,
hypervisor convert it to VHD file (.vmdk)
30. Read/Write IOPs
• Write IOPs are costly – Cache flushing,
RAID cost..
• Windows Paging IO – Paging IOs are
sensitive and slowing down it reduce
system performance.
• Memory intensive App may increase
Paging IO. Number of Paging writes may
go higher than write IOs from App.
31. What is new in VDI Storage
• More than a dozen Storage Startups
• Driver at guest OS stack to profile IOs.
• Module at Hypervisor storage stack which
do actual IO scheduling.
• Merge random IOs, Dedupe and
compression(make ssd cache effective).
• SSD cache
• Algorithm to analyze App IOPs(profiling)
and scheduling is proprietary to vendors.
32. What is new in VDI Storage
OS 1 OS 2 OS 3
IO Profiler IO Profiler IO Profiler
Hypervisor
IO Scheduler
Hardware
SSD Cache/Storage