2. Uncontrolled
Performance
=
Lost Revenue
Amazon.com
100 ms latency => 1% drop in sales
Aberdeen (1,000 company sample)
Latency => 9% enterprise revenue loss
Stable
High latency ads => 46% viewer loss
Google
500 ms =>20% less clicks: ad revenue loss
Copyright 2018 CachePhysics, Inc. 2
3. Delivering Data QoS is Challenging
Copyright 2018 CachePhysics, Inc. 3
App Datarequirements
changingdaily. Providing
QoS has become hard
4. Performance Management is Getting Harder
Copyright 2018 CachePhysics, Inc. 4
The problemsare
gettingmuch
worse with increasing hardware
complexity
6. New Layer Needed
Copyright 2018 CachePhysics, Inc. 6
Systems of
Intelligence
Self-Learn / Model
• Application workloads
• Infrastructure components
Control / Operate to maximize:
• Efficiency
• Performance
• SLA Conformance
Infrastructure
Applications
8. 8 Copyright 2018 CachePhysics, Inc.
Autonomous Performance Management?
9. KeepExisting
Applications
9Copyright 2018 CachePhysics, Inc.
KeepExisting
Infrastructure Add Real-time Self-Learning
Instrumentation
Enforce SLA
Optimize
Monitor
Existing Application Infrastructure
Model
Control
Apps & Microservices
Caches
Network, Database, Filesystem, Compute
Predict
SaaS API
Calls
API
Callbacks
Autonomous Performance
Management
Auto Size,
Reconfigure
10. Modeling Cache Performance in Real-Time
10
Lower is better
42 84 128 1700
Cache Size (GB)
Latency(ms)
• Learn performance model of applications
and cache
• Predict the performance of workload as
f(cache size, params)
Copyright 2018 CachePhysics, Inc.
Cache Performance
Hit Ratio
Cache Size
65%
128GB
123
11. Understanding Cache Models
Lower is better
42 84 128 1700
Cache Size (GB)
Latency(ms)
123
x2 x4 x8 Models help decide useful
increments of change.
In this example, no benefit
despite an 8x increase in
budget.
Copyright 2018 CachePhysics, Inc. 11
12. Understanding Cache Models
Lower is better
42 84 128 1700
Cache Size (GB)
Latency(ms)
123
Often, most operating points are
highly inefficient.
This cache is operating at the
lowest ROI point; equivalent
performance to 1/8 the budget.
Auto-scaling data infrastructure
would picking only the efficient
operating points.
Copyright 2018 CachePhysics, Inc. 12
13. Understanding Model-based Adaptation
Single Workload.
Prediction of
performance under
different policies.
A auto-scaling data
infrastructure would
always pick the optimal.
Copyright 2018 CachePhysics, Inc. 13
14. Sample Models From Production Workloads
Copyright 2018 CachePhysics, Inc. 14
15. ��
��
���
���
���
�� �� ��� ��� ���
��������������������
���������������
Achieving Latency Targets
Latency
Target (7 ms)
Cache
Allocation
(>16 GB)Client target 95th %ile latency is 7 ms
Autoset cache partitions size to 16GB to
guarantee avg latency SLOs * Throughput
targets can be
implemented
similarlyCache Size (GB)
95th%ileLatency(ms)
Copyright 2018 CachePhysics, Inc. 15
16. ��
��
���
���
���
�� �� ��� ��� ���
��������������������
���������������
Achieveing Multi-Tier Sizing
* Can model network
bandwidth as a
function of cache
misses from each tier
Tier 0 (DRAM) allocation
Tier 1 (3D Xpoint)
Network
Misses
Tier 2 (Local Flash)
Tier 3 (Remote
Flash)
Copyright 2018 CachePhysics, Inc. 16
18. Why Now?
Every code change can morph
infrastructure requirements. App
performance today: static,
manual, error-prone.
We auto-learn dynamically
New Apps: Cloud Native
Hardware complexity increasing,
even as data growth accelerates.
But latency not keeping up
w/FLOPS (-4%/yr).
We tame complexity, maximize ROI
New Layers: NVMe/3DXpoint
IncreasingComplexity
Massive DataGrowth
Latencyworseby4%/yr
SMR HDD
SLOW FLASH
FLASH
MRAM
RERAM
7200 RPM HDD
ENTERPRISE HDD
NVME
DRAM
I$
D$ LLC
SMR HDD, OPTICAL
TAPE
3DXPOINT
DRAM
Kubernetes’ CRDs + API extensions =
default platform for next decade. But
performance unpredictability pain
unsolved.
We layer on SLA guarantee solution
Infrastructure »» Declarative
* CD = Continuous Deployment, the predominant trend where code is deployed automatically after testing
Copyright 2018 CachePhysics, Inc. 18
19. Use Case #1: Self-service Infrastructure Sizing
Copyright 2018 CachePhysics, Inc. 19
• Value for Developers
• Observability dashboard
• Sizing recommendations
• Cost savings analysis
• Value for Ops
• Achieve desired SLAs for lowest spend
• Observability for data teams
• Faster troubleshooting for infra teams
• Predictive planning
Monitoring
Lower is better
42 84 128 1700
Resources ($$)
Latency
0.0
2.0
4.0
6.0
8.0
20. Use Case #2: Automated QoS SLA
Copyright 2018 CachePhysics, Inc. 20
• Value for Developers
• Dial-in latency or throughput target and a budget
• Auto-scaling data infrastructure allocates just
enough capacity to hit SLA
• Value for Ops
• Automated SLA achievement!
• Set and Forget, ease of mind
• Revenue disruption avoidance
• Improved margins
• Zero OpEx performance scaling
• Dramatically reduced service interruptions
Latency Guarantees
Latency
Target (7 ms)
Cache
Allocation
(>16 GB)
Client target IO latency is: 7 ms
Guarantee avg latency: autoset cache
partition to 16GB
21. Model your Redis cache!
1. Sign up for free trial: www.cachephysics.com/redisconf18
2. Test your access pattern to create a Redis cache model, receive a live dashboard
3. Cache Observability ==> Faster Time to Resolution ==> Performance SLAs
– Simple!
Copyright 2018 CachePhysics, Inc. 21