From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)
1. Memory is the new disk,
disk is the new tape
Bela Ban, JBoss / Red Hat
2. Motivation
● We want to store our data in memory
– Memory access is faster than disk access
– Even across a network
– A DB requires network communication, too
● The disk is used for archival purposes
● Not a replacement for DBs !
– Only a key-value store
– NoSQL
3. Problems
● #1: How do we provide memory large
enough to store the data (e.g. 2 TB of
memory) ?
● #2: How do we guarantee persistence ?
– Survival of data between reboots / crashes
4. #1: Large memory
● We aggregate the memory of all nodes in a
cluster into a large virtual memory space
– 100 nodes of 10 GB == 1 TB of virtual
memory
5. #2: Persistence
● We store keys redundantly on multiple
nodes
– Unless all nodes on which key K is stored
crash at the same time, K is persistent
● We can also store the data on disk
– To prevent data loss in case all cluster
nodes crash
– This can be done asynchronously, on a
background thread
7. Store every key on every node
A B C D
K1 K1 K1 K1
K2 K2 K2 K2
K3 K3 K3 K3
K4 K4 K4 K4
● RAID 1
● Pro: data is available everywhere
– No network round trip
– Data loss only when all nodes crash
● Con: we can only use 25% of our memory
8. Store every key on 1 node only
A B C D
K1 K2 K3 K4
● RAID 0, JBOD
● Pro: we can use 100% of our memory
● Con: data loss on node crash
– No redundancy
9. Store every key on K nodes
A B C D
K1 K1
K2 K2
K3 K3
K4 K4
● K is configurable (2 in the example)
● Variable RAID
● Pro: we can use a variable % of our memory
– User determines tradeoff between memory
consumption and risk of data loss
10. So how do we determine on which nodes the
keys are stored ?
11. Consistent hashing
● Given a key K and a set of nodes, CH(K)
will always pick the same node P for K
– We can also pick a list {P,Q} for K
● Anyone 'knows' that K is on P
● If P leaves, CH(K) will pick another node Q
and rebalance affected keys
● A good CH will rebalance 1/N keys at most
(where N = number of cluster nodes)
12. Example
A B C D
K1 K1
K2 K2
K3 K3
K4 K4
● K2 is stored on B (primary owner) and C
(backup owner)
13. Example
A B C D
K1 K1
K2 K2
K3 K3
K4 K4
● Node B now crashes
14. Example
A B C D
K1 K1 K1
K2 K2 K2
K3 K3
K4 K4
● C (the backup owner of K2) copies K2 to D
– C is now the primary owner of K2
● A copies K1 to C
– C is now the backup owner of K1
15. Rebalancing
● Unless all N owners of a key K crash
exactly at the same time, K is always
stored redundantly
● When less than N owners crash,
rebalancing will copy/move keys to other
nodes, so that we have N owners again
16. Enter ReplCache
● ReplCache is a distributed hashmap
spanning the entire cluster
● Operations: put(K,V), get(K), remove(K)
● For every key, we can define how many
times we'd like it to be stored in the cluster
– 1: RAID 0
– -1: RAID 1
– N: variable RAID
17. Use of ReplCache
JBoss ReplCache
Servlet
Apache JBoss ReplCache
Cluster
HTTP Servlet
mod_jk
JBoss ReplCache
Servlet
DB
19. Use cases
● JBoss AS: session distribution using
Infinispan
– For data scalability, sessions are stored
only N times in a cluster
● GridFS (Infinispan)
– I/O over grid
– Files are chunked into slices, each slice is
stored in the grid (redundantly if needed)
– Store a 4GB DVD in a grid where each
node has only 2GB of heap
20. Use cases
● Hibernate Over Grid (OGM)
– Replaces DB backend with Infinispan
backed grid
21. Conclusion
● Given enough nodes in a cluster, we can
provide persistence for data
● Unlike RAID, where everything is stored
fully redundantly (even /tmp), we can
define persistence guarantees per key
● Ideal for data sets which need to be
accessed quickly
– For the paranoid we can still stream to disk
22. Conclusion
● Data is distributed over a grid
– Cache is closer to clients
– No bottleneck to the DBMS
– Keys are on different nodes