NVMe storage systems and NVMe networks promise to reduce latency further and increase performance beyond what SAS based flash systems and current networking technology can deliver. To take advantage of that performance gain however, the data center must have workloads that can take advantage of all the latency reduction and performance improvements that NVMe offers. Vendors emphatically state that NVMe is the next must-have technology, yet many still continue to provide SAS based arrays using traditional networks.
How do IT planners know then, that investing in NVMe will truly provide their organizations the benefits of NVMe for their demanding applications and see a measurable return on investment? Just creating a test environment to perform an NVMe evaluation can break the IT budget!
Register now to join Storage Switzerland, Virtual Instruments, and SANBlaze as we look at the state of the data center and provide IT planners with the information they need to decide if NVMe is an investment they should make now or if they should wait a year or more. The key is determining which applications can benefit from NVMe-based approaches.
In this event, IT professionals will learn
- About NVMe, NVMe Storage Systems and NVMe over Fabric Networking
- The Performance Potential of NVMe Storage and Networks
- What attributes are needed for a workload to take advantage of NVMe
- Why NVMe creates problems for current IT testing strategies
- Why a Workload Simulation approach is the only practical way to test NVMe
- How to build a storage performance validation practice
5. What are the NVMe
Drawbacks?
● More expensive,
Eco-system needs to
be better
● Not an upgrade, a new
storage system
● Infrastructure overhaul
(for end-to-end)
solutions
6. 5
Expectation for NVMe is sky high
5
92% have no direct
experience with NVMe
91%+ expect positive impact
from NVMe
68%~84% intend to deploy
NVMe technologies
Source: State of NVMe: Perceptions and Misconceptions, ActualTech Media,
7. What Attributes
do NVMe ready
Workloads Have?
● Massively parallel
● High bandwidth
● Application scalability
● Rapid response time
8. IT Needs
○ To make sure that its applications
demonstrably benefit from NVMe
○ To make sure infrastructure
sustains NVMe IO capabilities
○ Performance claims from vendors
vary wildly
Testing New
Systems is Critical
9. Building a Test
Environment is
Hard!
● Requires significant compute
investment
● Requires networking better than
or least as good as production
● Test or Scripts to sustain
continuous IO stream
● Investment in the lab may be
bigger than production
10. Production Workload
Modeling May be The
Only Practical Way to
Test
● What is Production Workload
Modeling?
● Production Workload Modeling
architecture
● Production Workload Modeling
process
11. Leads to The Creation of a Storage
Performance Validation Practice
● A formal testing and
evaluation process
● Continuous testing of the
current environment to predict
next required upgrade
12. 11
A few things to remember when you’re trying to improve performance
Performance is a
function of your
workloads
Your performance
bottlenecks might
be elsewhere
(for once, storage is not to blame!)
Problems can come
from the least
expected places
13. 12
A few things to remember when you’re trying to improve performance
Performance is a
function of your
workload
Your performance
bottlenecks might
be elsewhere
(for once, storage is not to blame!)
Problems can come
from the least
expected places
You don’t want to be this guy
(and, we’ve seen a lot of guys like him…here are a few examples)
14. 13
A system that performs well for
one workload might not perform
well for another workload
Enterprise-grade software
features, like inline compression
and deduplication, can also
affect performance in
unexpected ways
A few things to remember when you’re trying to improve performance
1. Performance is a function of the workload
IOPS
0
50000
100000
150000
200000
250000
300000
20% /
80%
50% /
50%
80% /
20%
20% /
80%
50% /
50%
80% /
20%
20% /
80%
50% /
50%
80% /
20%
Vendor A
Vendor B
20%
Reducible
50%
Reducible
80%
Reducible
Inline Dedupe /
Compression:
Configuration A
does better when
data is highly
reducible
Configuration A
Configuration B
Read / Write ratios:
Performance gap is greatest
when workloads are read heavy
Read / Write ratios
15. 14
Your applications, especially your most critical applications, do not live alone in a silo.
A few things to remember when you’re trying to improve performance
2. Your performance bottlenecks may be elsewhere
A 2x cluster deployment
resulted in virtually zero
performance gains
Internal VM to VM
communication
VM to Datastore
dependency
VM communication
inside and outside the DC
Moving an application to NVMe would not have made a noticeable performance to the end user.
16. 15
Many planned and unplanned
changes can affect performance:
- Introducing new technologies
- Changing user behaviors
- VMotion sickness
- Even innocent firmware
upgrades!
A few things to remember when you’re trying to improve performance
3. Problems can come from the least expected places
• Completely ended surprises with new firmware releases
“Before VI, our latest storage upgrade would have been an all- hands-
on-deck call-out and my team would have been severely criticized for
what could have been interpreted as a real problem. “
“With VI monitoring, we could demonstrate, in real- time, that longer
latencies were due to the upgrade and not by any real problems in the
SAN”
Response Time
Application response
time would have
increased by 3x after
the firmware
“upgrade”!
17. 16
Production Workload Modeling Methodology for NVMe-oF
1. Analyze your own production workload data
Production
Workloads
Analysis
Application
Workloads
Storage
Infrastructure
Analyze
Commands Temporality
Locality Data
Production
Workloads
Models
NVMe Storage
Application
Workload Models
SANBlaze Workload Generator
NVMe-oF
Model
Commands Temporality
Locality Data
Make decision based on your data
Continuously monitor your decision
18. 17
Create clustering of Workloads
- In production, may have 100,000+ ITLs
- In test environment, may not have, and not necessary, to replicate that 1:1
- Clustering provides a highly accurate yet scalable way of modeling workloads
Production Workload Modeling Methodology for NVMe-oF
Test with a representative test environment
Production
Workload
Clustering
Algorithm
WL
WL
WL
WL
Composite
Workload
NVMe-oF
System
Under
Test
Application Workloads
Storage
Infrastructure
95% ~ 99% accuracy
19. 18
Industry unique Production Workload
Analysis and Modeling solution:
• Analyze and model your current
production storage workloads
• Determine optimal NVMe-oF storage
systems and configurations
• Contain CAPEX costs
• Make your NVMe decision with your data
Workload Modeling Platform Workload Generator+
21. 20
Production Workload Modeling Methodology for NVMe-oF
Purpose-built workload modeling solution for current and future infrastructures
Simple to use sliders
that enable complex
workloads with ease!
Optimize the NVMe
queues to maximize
performance for your
environment
Understand clearly
what is best for your
own workloads
22. 21
Purpose-built Test Solution vs. DIY Test Labs
Do MORE testing. Do it FASTER. Cover more USE CASES. Lower Testing Costs!
Workload Generator
1U footprint
Realism, Repeatability, Scalability
>42U + VM, DB and OS Licenses
Synthetic, Tribal
CAPEX & OPEX
24. Does Your Data Center Need NVMe?
For complete audio and Q&A please register for the
On Demand Version: bit.ly/DCNeedNVMe
Editor's Notes
Often the performance conversation of NVMe goes something like “SAS / SATA with SCSI performs at X, all else being equal, NVMe performs at Y. So NVMe gives you N% improvement.” Well, that’s technically correct, but, the “all else being equal” is a very big statement that generally can’t be applied in real life, because your workloads are constantly different.
Here we have an example where you can’t simply draw a line in the sand and say A is better than B, because workload behaviors change all the time. Here we have two configurations, where Configuration B performs a lot better than A when workloads are read heavy, and when data is not highly reducible. But as the workloads become more and more reducible, we see that Configuration A starts to outperform Configuration B. So what you’re seeing here is that enterprise-grade software features like inline compression and deduplication will also affect overall workload performance.
So, if you’re only looking at NVMe from the perspective of performance for a particular “sample workload”, you may be unpleasantly surprised when you actually deploy it in your environment.
This piece is actually pretty interesting. Not too long ago we went in to profile a particular production environment, and graphed out the communication and storage dependency map of a segment of the data centers. So what you’re seeing is that there are a set of hosts that communicate with each other very frequently, and these hosts are spread across a number of datastores. In addition, many hosts communicate within the intranet as well as the internet.
So, if you’re focused on your tier 0 apps let’s say, and you’re focused on upgrading their storage to NVMe, then you might not get the end performance boost that you expect for your end users because these tier 0 apps may be waiting on other apps and hosts that are running on slower storage. Or, depending on the application, it may spend a lot of its time going over the internet getting data from SaaS services or apps you moved to the cloud.
Now this last one has nothing to do with NVMe directly. But as with all new technologies, you should expect more frequent software updates and patches. Even the most innocent changes like a firmware “upgrade”, may turn out to be the worst decision you’ve ever made.