In-memory compute gave up on storage and moved the active working set to memory. This brings tremendous performance gains, but also consumes expensive DRAM resources; puts data at risk; and suffers from slow recovery time when power failures occur.
In this talk we will present the convergence of memory and storage, and how it can address these deficiencies. We will show examples in which Software-defined memory (SDM) has enabled: running working sets that are much larger than the DRAM budget; provide last-transaction safety; and immediately recover from power failure.
2. Data
set
ABSTRACT
In-memory compute gave up on Storage and moved the working set to Memory.
This brings tremendous performance gains, but also:
1. Consumes expensive DRAM resources
2. Puts data at risk
3. Suffers from slow recovery time when power failures occur
…
The big Question:
How will IMC look like when Memory and Storage converge?
Working
set
2
3. Agenda:
History &The convergence of Memory & Storage
Benefits – Out-of-the-box
Benefits –That require some work
3
4. A LONG TIME AGO…
Ideal
Storage
Requirements for Ideal Storage:
1. Low latency reads
2. High volume persistent writes
3. Reasonable cost
4. Transparent & easy to use
CostLatency
Persistency
DRAM
HDD
SSD
Unfortunately such Storage (#2) did not exist
Big Data Middleware
4
5. SO MIDDLEWARE DEVELOPERS & USERS COMPROMISED
Commit
Log
Memory
Table
Storage
Table
Persistent,
Pretty Fast
Cheap
Fast
Search
acceleration
1. Storage had Horrible latency for persistent writes,
but not as bad if sequentially written
2. So IMC middleware compensated by using:
- Sequential writes at the expense of read latency
- Async writes at the risk of data loss
- Caching like crazy at the expanse of HW cost (DRAM)
- Write amplification at the expanse of HW cost (Storage)
- Compaction at the expense of HW cost (CPU)
Original requirements Vs. IMC reality:
1. Low latency reads
2. High volume (eventual) persistent writes
3. Reasonable cost
4. Transparent & easy to use 5
6. WHAT HAS CHANGED?
Memory & Storage are converging:
New HW - Persistent Memory (PM, e.g. NVDIMM-N)
New SW - Software Defined Memory (SDM)
Persistency
DRAM
HDD
SSD
PM
PM+SDM delivers:
1. Low latency reads
2. HighVolume persistent writes
3. Reasonable cost
4. Transparent & easy to use
CostLatency
SDM
SDM-ephemeral delivers:
1. Low latency reads
2. High volume persistent* writes
3. Reasonable cost
4. Transparent & easy to use**
* Persistent on orderly shutdowns, not power failures
** Easy to use within share nothing architecturesPersistency
DRAM
HDD
SSD
CostLatency
SDM-ephemeral
6
7. HOWTO LEVERAGE SDM?
SDM
Scenario II
New Middleware / Some work to existing
Scenario I
Existing Middleware – Out of the box
SDM
SDM
SDM
7
8. Agenda:
History & the convergence of Memory & Storage
Benefits – Out-of-the-box
Benefits –That require some work
8
9. OUT OF THE BOX INTEGRATION
DRAM/PM FLASH DISK
I/O Path Memory Path
Fast Storage Huge Memory
Data Services
Virtual MemoryHDFS POSIX
Plexistor FS (Multi Tier, DAX)
Linux
1. Download & Install SDM
2. Mount m1fs
3. Run your application
9
10. OOB BENEFIT 1: LARGE WORKING SETS
Work set 2x Memory size
SDM at 17,000 ops/sec
XFS at 2,000 ops/sec
Performance is highly sensitive to
Working set size > Aggregated memory size
Working set size is dynamic and hard to predict
Large clusters are expensive
Cassandra v3.0.2
I2.4xlarge instance on AWS
Data
set
Working
set
10
11. OOB BENEFIT 2: PERSISTENCY
Performance is highly sensitive to persistency/durability requirements
Replication/Mirroring between nodes without persistency is vulnerable to Power Failures
Data loss risk is often not well explained. Confusion leads to wasteful behavior (#copies, Network)
0
30,000
60,000
90,000
120,000
150,000
180,000
Ops/sec
TheTraditionalTradeoff
(B) Balanced (D) Durable
MongoDB v3.2
E5-2650v3, CloudSpeed SSD
*
(*) – This actually writes two persistent copies: in Memory Table and in Commit Log
11
12. OOB BENEFIT 3: LONG RE-BUILD TIMES
Nodes occasionally fail in large clusters
Re-build take many hours to complete
due to extra pressure on the storage layer
ClientsClients
Couchbase
server
Couchbase
server
Couchbase
server
Couchbase
server
Couchbase
server
X
Couchbase v4.5 beta
E5-2650v4, CloudSpeed SSD
12
13. OOB BENEFIT 4: PREDICTABILITY
No hiccups due to separate memory and storage stacks
Highly predictable performance
time
TPS
MySQL v5.6
E5-2680v3, HGST SN150
DB load generator runs at target (not maximal) speed 13
14. Agenda:
History & the convergence of Memory & Storage
Benefits – Out-of-the-box
Benefits –That require some work
14
15. BENEFITS THAT REQUIRE WORK AT THE MIDDLEWARE LAYER
A lot of potential for Fast Queries & Simplicity
SDM
Storage
Big Data middleware
File-level FIO
E5-2650v3, CloudSpeed SSD 15
16. EXAMPLE - AMPOOL
16
• Fast & Standard access throughout
the data pipeline
• 56x faster ingest
3-4x faster OLTP&OLAP than HBase
6x faster Spark than Tachyon
17. DESIGNING MIDDLEWARE IN THE SDM ERA
1. Realize that you’re a storage/memory billionaire
– focus on your business logic
2. Use standard POSIX API and share files between frameworks (polyglot)
3. Use SDM zero-cost Clones (cp –reflink)
4. Rely on SDM Auto-tiering (If you must – hint via fadvise/madvise)
5. Consider relying on SDM Mirroring capabilities
6. Use SDM monitoring tools to understand your resource consumption
17
18. SUMMARY
Memory and Storage have already started converging (SDM)
IMC best practices are no longer the “best”
SDM provides value to IMC out-of-the-box
but
There is even greater opportunity for those willing to integrate
Efficiency
Simplicity
18
19. Q & A
Free SDM download - www.plexistor.com/download/
White papers - www.plexistor.com/resources/
Blog - www.plexistor.com/blog/
amit@plexistor.com
19
20. HIGH AVAILABILITY - CLARIFICATION
Almost zero-latency added for having a 2nd copy, providing that high-speed RDMA network is in place
Public cloud deployments – Keep using your current HA strategy
On premise deployments – Can substitute most copies with storage redundancy
App server 1
Plexistor SDM
App server 2
Plexistor SDM
App server N
Plexistor SDM
High-speed
RDMA
Open
Brick 1
Open
Brick M
20
21. SDM VS. XFS-DAX VS. NVML - CLARIFICATION
Plexistor
ext4/xfs
DAX
NVML
Scale Out Application
AutoTiering Application
Snapshots/Clones Application
Legacy Applications
NVML support
High availability Application
IT policy hooks
DRAM/PM
Memory Path
Virtual MemoryPOSIX
FS w/ DAX support*
Linux
App using
mmap
App using
NVML
(*) Who supports DAX:
- Plexistor SDM
- Linux xfs-dax, and ext4-dax (WIP)
- MS ReFS-dax (WIP) 21