5. A distributed, memory-based data management platform for
data oriented apps that need:
• high performance, scalability, resiliency and continuous
availability
• fast access to critical data set
• location aware distributed data processing
• event driven data architecture
5
What is it?
7. • 1000+ systems in production (real customers)
• Cutting edge use cases
7
Who are the users?
2004 2008 2014
• Massive increase in data
volumes
• Falling margins per
transaction
• Increasing cost of IT
maintenance
• Need for elasticity in
systems
• Financial Services
Providers (every major
Wall Street bank)
• Department of Defense
• Real Time response needs
• Time to market constraints
• Need for flexible data
models across enterprise
• Distributed development
• Persistence + In-memory
• Global data visibility needs
• Fast Ingest needs for data
• Need to allow devices to
hook into enterprise data
• Always on
• Largest travel Portal
• Airlines
• Trade clearing
• Online gambling
• Largest Telcos
• Large mfrers
• Largest Payroll processor
• Auto insurance giants
• Largest rail systems on
earth
8. • 17 billion records in memory
• GE Power & Water's Remote Monitoring & Diagnostics Center
• 3 TB operational data in-memory, 400 TB archived
• China Railways
• 4.6 Million transactions a day / 40K transactions a second
• China Railways
• 120,000 Concurrent Users
• Indian Railways
8
Who are the users?
9. World: ~7,349,000,000
~36% of the world population
Population: 1,251,695,6161,401,586,609
China Railway
Corporation
Indian Railways
11. Numbers Everyone Should Know
11
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10,000 ns 0.01 ms
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Disk seek 10,000,000 ns 10 ms
Read 1 MB sequentially from network 10,000,000 ns 10 ms
Read 1 MB sequentially from disk 30,000,000 ns 30 ms
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/stanford-295-talk.pdf
12. What makes it fast?
• No ORM
• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning and parallelism
• Avoid disk seeks
• Automated benchmarks
14. Horizontal scaling for reads, consistent latency and CPU
0
4.5
9
13.5
18
Speedup
0
1.25
2.5
3.75
5
Server Hosts
2 4 6 8 10
speedup
latency (ms)
CPU %
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size
16. • Clone & Build
16
Hands-on: Build & run
git clone https://github.com/apache/geode
cd incubator-geode
./gradlew build
• Start a server
cd gemfire-assembly/build/install/apache-geode
./bin/gfsh
gfsh> start locator --name=locator
gfsh> start server --name=server
gfsh> create region --name=myRegion --type=REPLICATE
$ docker run -it apachegeode/geode
• Docker
• Download
http://geode.apache.org/releases/
17. • Locator
• Discovery service
• JMX manager
• Cluster config manager
• Servers
• Stores data
• Embeddable within your application
• Clients
• Your Application
17
Member Types
Client
Locator
Server
18. • Region
• Distributed
java.util.ConcurrentHas
hMap on steroids (Key/Value)
• Consistent API regardless of
where or how data is stored
• Observable (reactive)
• Highly available, redundant
on cache Member (s).
Concepts - Region
18
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
19. • Region
• Local, Replicated or
Partitioned
• In-memory or persistent
• Redundant
• LRU, TTL
• Overflow
Region Options
19
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
20. • Object Query Language (OQL)
• SQL like
• Query Complex Objects, attributes, methods
• Not as performant at get()
Concepts - OQL
20
class Portfolio {
int ID;
String type;
String status;
Map positions;
}
class Position {
String secId;
double mktValue;
double qty;
}
• SELECT * FROM /portfolio WHERE status = ‘active’
• SELECT p, pos FROM /portfolio p, p.positions.values pos WHERE pos.secId
= ‘VMW'
• SELECT DISTINCT * FROM /portfolio p WHERE p.positions.size >= 2