How to Fail at VDI

Welcome

BriForum | © TechTarget

How to Fail at VDI

Dan Brinkmann @dbrinkmann
blog.danbrinkmann.com
Solutions Architect, VMware vExpert
Lewan & Associates (Denver, CO)

“What business problem
are we solving?”


Business/Expectation VDI Failures
● No business problem
● Desktop virtualization is not server virtualization
● Saving money
● Project in the hands of the vSphere administrator
● No success criteria
● Assume you know what users do
● The same or better experience remotely as locally

BriForum | © TechTarget 4

Agenda
To understand what causes VDI failures
● Compute
● Storage
● Guessing


How to Fail at VDI
The technology failure points
● Test with 5 users
● Using vendor provided users/core sizing
● Using vendor provided IOPs estimates
● Ignore anti-virus
● Ignore user profile management
● Use existing desktop images for physcial PC’s
● Guess


Compute
It’s magic until it stops working
● Multi-threaded apps
● Latency sensitive workloads
● Hyperthreading
● Latency = Health


Compute
CPU scheduler in vSphere
● CPU scheduler in vSphere is entitlement/consumption
based, not priority (unlike Windows)
● There is no priority in the CPU scheduler
● Given equal entitlement the more a vm/world consumes
the more likely it is to be prempted by another vm/world
● http://www.vmware.com/resources/techresources/10131


Compute with a Physical PC

OS/Apps/Profil
e

CPU 1


Compute with Citrix XenApp

OS/Apps/Pr OS/Apps/Pr OS/Apps/Pr OS/Apps/Pr
OS/Apps/
ofile OS/Apps/
ofile OS/Apps/
ofile OS/Apps/
ofile
Profile Profile Profile Profile

CPU 1 CPU 2


Compute with VDI

CPU 1 CPU 2


vSphere Compute
This is poor performance monitoring


vSphere Compute
This is better performance monitoring - ESXTOP

Display Metric Threshold Explanation
Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check
CPU %RDY 10
%MLMTD) has been set.
Excessive usage of vSMP. Decrease amount of vCPUs for this
CPU %CSTP 3 particular VM. This should lead to increased scheduling
opportunities.
The percentage of time spent by system services on behalf of the
CPU %SYS 20 world. Most likely caused by high IO VM. Check other metrics and
VM for possible root cause
The percentage of time the vCPU was ready to run but deliberately
wasn’t scheduled because that would violate the “CPU limit”
CPU %MLMTD 0
settings. If larger than 0 the world is being throttled due to the limit
on CPU.
VM waiting on swapped pages to be read from disk. Possible cause:
CPU %SWPWT 5
Memory overcommitment.


vSphere Compute


vSphere Compute
%CSTP probably driving %RDY values


vSphere Compute
Now with fewer vCPU’s


Summary on Compute

● Multithreading, vSMP
● Not priority based
● % Utilization is not the complete picture
● http://kb.vmware.com/selfservice/microsites/search.do?la
nguage=en_US&cmd=displayKC&externalId=1017926


Storage
The wrath of the math
● #1 cause of performance issues in server virtualization
● #1 cause of performance issues in desktop virtualization
- 20ms - in trouble
- 50ms - your users hate you


What You Need to Know

● Capacity vs performance
● Random vs sequential
● Average vs peak
● Where it’s coming from
● Most are guessing


Storage
Spinning disk

Device Type IOPS
7,200 rpm SATA drives HDD ~75-100 IOPS
10,000 rpm SATA drives HDD ~125-150 IOPS
10,000 rpm SAS drives HDD ~140 IOPS
15,000 rpm SAS drives HDD ~175-210 IOPS


RAID Penalty

RAID level Read Write
RAID 0 1 1
RAID 1 and 10 1 2
RAID 5 1 4
RAID 6 1 6


The Math – RAID 5 50/50
Some back of the napkin math
● 500 users, Windows 7, 20 IOPs avg, 50/50 read/write
RAID 5

● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write
● 5,000 write * 4 = 20,000 + 5,000 read = 25,000 IOPs
● 25,000 IOPs on 15K spindles (200 IOPS) = 125 spindles


RAID 10

● 500 * 20 = 10,000 IOPs – 5,000 read, 5,000 write
● 5,000 write * 2 = 10,000 + 5,000 read = 15,000 IOPs


RAID 10

● 500 * 20 = 10,000 IOPs – 2,000 read, 8,000 write
● 8,000 write * 2 = 16,000 + 2,000 read = 18,000 IOPs


vSphere Storage Latency

Application

A Application Latency
Filesystem
Guest
I/O Drivers R R = Physical Disk
“Disk Secs/Transfer”
Device Queue
S
G = Guest Latency
K G
K = ESX Kernel
Virtual SCSI

VMkernel Filesystem

D D = Device Latency


vSphere Storage
Performance monitoring for storage

Display Metric Threshold Explanation
Look at “DAVG” and “KAVG” as the sum of both is
DISK GAVG 20
GAVG.
DISK DAVG 20 Disk latency most likely to be caused by array.
Disk latency caused by the VMkernel, high KAVG
DISK KAVG 2
usually means queuing. Check “QUED”.
Queue maxed out. Possibly queue depth set to low.
DISK QUED 1 Check with array vendor for optimal queue depth
value.
Aborts issued by guest(VM) because storage is not
DISK ABRTS/s 1
responding. Can be caused when paths failed.
DISK RESETS/s 1 The number of commands reset per second.
SCSI Reservation Conflicts per second. Can be
DISK CONS/s 20
caused by too many VMDKs on a datastore.


Building for Read IOPs
Fairly easy
● Memory - Storage controller cache, PVS
● Host/Hypervisor - CBRC, Intellicache
● Storage - SSD tiering / flash cache


Building for Write IOPs
Much harder…and expensive
● Profiles/Apps
● Spinning disk
● SSD tiering
● Local disk
● IO optimization (dedupe, serializing IO)


Storage Summary

● 25,000 IOPs R5 50/50 – 125 spindles
● 15,000 IOPs R10 50/50 – 75 spindles
● 18,000 IOPs R10 20/80 – 90 spindles
● Latency is the key metric
● Write IOPs & things that cause it is #1 focus


How does this relate to VDI failure?

● Pilot performance is great, then terrible in production
● Boot storm vs login storm
● Applications in gold image vs streamed
● Read/write ratio is important
● Anti-virus software
● Existing desktop images


Guessing
You need to use tools to do this
● Initial sizing
● Determine peaks and when
● Baseline application impact
● Monitor application impact over time
● Application updates/changes


Project testing
Good to know what you are and aren’t doing
● Unit/system testing
● Application testing
● Performance/scalability testing
● Operational testing
● User acceptance testing


Summary

● Understand your limited resources (compute/storage)
● Don’t guess
● 5 users = what kind of testing, what are you really
accomplishing?


How to Fail at VDI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to How to Fail at VDI

Similar to How to Fail at VDI (20)

Recently uploaded

Recently uploaded (20)

How to Fail at VDI