Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
5. A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
5
6. High-level Architecture
6
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policiesÎ
!
Locator
Server
Server
Server
Server
+""""
"
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer
Distributed System
REST
!
* Experimental and waiting community feedback
7. • 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
7
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-time response
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemfire > Geode
• M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems
8. …with both SCALE and SPEED, …
8
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users
9. … and impacting a LOT of people!
9
China Railway
Corporation
Indian
Railways
19%
17%
36%
of the world population
11. …and horizontal, consistent SCALABILITY!
11
Horizontal scaling for reads, consistent latency and CPU
0
4.5
9
13.5
18
Speedup
0
1.25
2.5
3.75
5
Server Hosts
2 4 6 8 10
speedup latency (ms) CPU %
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size
12. • Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
13. • Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
Let’s talk about a few (basic) CONCEPTS…
13
14. • In-memory storage and
management for your data
• Configurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14
Region
Region
Region
Cache
JVM
15. • Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of where
or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
16. • Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
17. • Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
18. • A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server
19. • A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on
the servers
What is a CLIENT CACHE?
19
Application
GemFire Server
Region
Region
RegionClient Cache
20. • Clone & Build
•
• Start Services
• Create & Monitor Region
How to START? Easy as !!
20
git clone https://github.com/apache/incubator-geode
cd incubator-geode
./gradlew build -Dskip.tests=true
cd gemfire-assembly/build/install/apache-geode
./bin/gfsh
gfsh> start locator --name=locator
gfsh> start server --name=server
gfsh> create region --name=myRegion —type=REPLICATE
gfsh> start [pulse | jconsole]
1
2
3
'
1 2 3
34. • Register Interest
• Individual Keys OR RegEx for Keys
• Updates Local Copy
• Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
• Continuous Query
• Receive Notification when Query condition met on server
• Example:
• SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
Events & Notifications
34
35. • CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
Listeners
35
40. But HOW to serialize data?
40
Benchmark: https://github.com/eishay/jvm-serializers
41. Schema Evolution
41
Member A Member B
Distributed Type Definitions
v2v1
Application #1
Application #2
v2 objects preserve data
from missing fields
v1 objects use default values to
fill in new fields
PDX provides forwards and backwards
compatibility, no code required
43. Code
• New features
• Bug fixes
• Writing tests
Documentation
• Wiki
• Web site
• User guide
How to CONTRIBUTE?
43
Community
• Join the mailing list
• Ask or answer
• Join our HipChat
• Become a speaker
• Finding bugs
• Testing an RC/Beta