Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
The new ehcache 2.0 and hibernate spi
1. Caching, and what’s new in Ehcache
2 and Hibernate Caching Provider
Thursday, 17 June 2010
2. Why Cache?
Reasons to cache:
Offload - reducing the amount of resources consumed and hence cost
Performance - increasing the speed of processing
Scale out - distributed caching is a leading scale-out architecture
www.terracotta.org 2
Thursday, 17 June 2010
3. Truncating the request-response Loop
load balancer
load balancer
HTTPD HTTPD HTTPD
Application Application
server server
ehcache ehcache
Terracotta ... Terracotta
Terracotta
Server ... Terracotta
Server MySQL
Server Database
stripe 1 Server
stripe n
stripe 1 stripe n
www.terracotta.org 3
Thursday, 17 June 2010
4. Truncating the request-response Loop
load balancer
load balancer
HTTPD HTTPD HTTPD
Application Application
server server
ehcache ehcache
Terracotta ... Terracotta
Terracotta
Server ... Terracotta
Server MySQL
Server Database
stripe 1 Server
stripe n
stripe 1 stripe n
www.terracotta.org 3
Thursday, 17 June 2010
5. Truncating the request-response Loop
load balancer
load balancer
HTTPD HTTPD HTTPD
Application Application
server server
ehcache ehcache
Terracotta ... Terracotta
Terracotta
Server ... Terracotta
Server MySQL
Server Database
stripe 1 Server
stripe n
stripe 1 stripe n
www.terracotta.org 3
Thursday, 17 June 2010
6. Truncating the request-response Loop
load balancer
load balancer
HTTPD HTTPD HTTPD
Application Application
server server
ehcache ehcache
Terracotta ... Terracotta
Terracotta
Server ... Terracotta
Server MySQL
Server Database
stripe 1 Server
stripe n
stripe 1 stripe n
www.terracotta.org 3
Thursday, 17 June 2010
7. Truncating the request-response Loop
load balancer
load balancer
HTTPD HTTPD HTTPD
Application Application
server server
ehcache ehcache
Terracotta ... Terracotta
Terracotta
Server ... Terracotta
Server MySQL
Server Database
stripe 1 Server
stripe n
stripe 1 stripe n
www.terracotta.org 3
Thursday, 17 June 2010
8. Amdahl’s Law
Amdahl's law, after Gene Amdahl, is used to find the system speed up
from a speed up in part of the system.
1 / ((1 - Proportion Sped Up) + Proportion Sped Up / Speed up)
To apply Amdahl’s law you must measure the components of system
time and the before and after affect of the perf change made.
It is thus an empirical approach.
Not recommended is the other approach “When all you have is a hammer
every problem looks like a nail”. Lots of times it is something completely
new. However very few developers will take the time to make careful
measurements.
www.terracotta.org 4
Thursday, 17 June 2010
9. Cache Efficiency
cache efficiency = cache hits / total hits
➡ High efficiency = high offload
➡ High efficiency = high performance
www.terracotta.org 5
Thursday, 17 June 2010
10. Why does caching work?
Locality of Reference
Pareto Distributions
www.terracotta.org 6
Thursday, 17 June 2010
11. Locality of Reference
Many computer systems exhibit the phenomenon of locality of reference.
Data that is near other data or has just been used is more likely to be
used again.
Temporal locality - refers to the reuse of specific data and/or resources
within relatively small time durations.
Spatial locality - refers to the use of data elements within relatively close
storage locations
e.g. this is the reason for hierarchical memory design in computers
www.terracotta.org 7
Thursday, 17 June 2010
12. Pareto Distributions
Chris Anderson, of Wired Magazine, coined the term The Long Tail to
refer to Ecommerce systems.
The mathematical term is a Pareto Distribution aka Power Law
Distribution.
www.terracotta.org 8
Thursday, 17 June 2010
13. Another Problem...
But...
What if the data set is too large to fit in the cache?
What about staleness of data
www.terracotta.org 9
Thursday, 17 June 2010
14. Coherency with SOR
classically solved with an automatic expiry. Ehcache has both TTL and
TTI.
Better:
Eternal caching plus a cache invalidation protocol
Write-through or behind caches. The SOR gets updated in-line with the
cache. Hibernate read-write and transactional strategies are examples.
Also Ehcache CacheWriter
www.terracotta.org 10
Thursday, 17 June 2010
15. Why run a cluster?
Availability, most often n+1 redundancy
Scale out
But this creates a new cascade of problems:
• N * problem
• Cluster coherency problem
• CAP theorem limits
www.terracotta.org 11
Thursday, 17 June 2010
16. N * Problem
On a single node work is done once and then cached. Cache hits offload.
But in a cluster the work must be done N times, where N is the number of
nodes
The solution to the N * times problem is a replicated or distributed cache
replicated cache - the data is copied to each node. All data is held in
each node
distributed cache - the most used data is held in a node. The balance of
the data is held outside the application node
www.terracotta.org 12
Thursday, 17 June 2010
17. Cluster Coherency Problem
Each cache makes independent cache changes. The caches become
different
Solved partly by a replicated or distributed cache
But, without locking, race conditions will cause incoherencies
Solution is a coherent, distributed or replicated cache
www.terracotta.org 13
Thursday, 17 June 2010
18. CAP Theorem PACELC
The CAP theorem, also known as Brewer's theorem, states that it is
impossible for a distributed computer system to simultaneously provide
all three of the following guarantees: consistency, availability and
tolerance to partition.
A better explanation of the tradeoffs is PACELC: if there is a partition (P)
how does the system tradeoff between availability and consistency (A
and C); else (E) when the system is running as normal in the absence of
partitions, how does the system tradeoff between latency (L) and
consistency (C)?
There is no right answer, but the properties to be traded off will be
different for different applications. So the solution must be configurable.
1. http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
www.terracotta.org 14
Thursday, 17 June 2010
19. About Ehcache
The world's most widely used Java cache
Founded in 2003
Apache 2.0 License
Integrated by lots of projects, products
Hibernate Provider implemented 2003
Web Caching 2004
Distributed Caching 2006
Greg Luck becomes co-spec lead of JSR107
JCACHE (JSR107) implementation 2007
REST and SOAP APIs 2008
SourceForge Project of the Month March 2009
Acquired by Terracotta 2009; Integrated with Terracotta Server
Ehcache 2.0 March 2010
Forrester Wave “Leader” May 2010
www.terracotta.org 15
Thursday, 17 June 2010
25. Ehcache 2 - New Features
Hibernate 3.3+ Caching SPI
Old SPI was heavily synchronized and not well suited to
clusters
New SPI uses CacheRegionFactory
Fully cluster safe with Terracotta Server Array
Unification of the Ehcache and Terracotta 3.2 providers
JTA i.e. transactional caching strategy
JTA
Cache as an XAResource
Detects most common Transaction Managers
Others configurable
Works with Spring, EJB and manual transactions
www.terracotta.org 21
Thursday, 17 June 2010
26. Ehcache 2 - New Features
Write-behind
Offloads Databases with high write workloads
CacheWriter Interface to implement
cache.putWithWriter(...) and cache.removeWithWriter(...)
Write-through and Write-behind modes
Batching, coalescing and very configurable
Standalone with in-memory write-behind queue.
TSA with HA, durability and distributed workload balancing
Bulk Loading
incoherent mode for startup or periodic cache loading
10 x faster
No change to the API (put, load etc).
SetCoherent(), isCoherent(), waitForCoherent()
www.terracotta.org 22
Thursday, 17 June 2010
27. Ehcache 2 - New Features ...cont.
New CAP configurability – per cache basis
coherent – run coherent or incoherent (faster)
synchronousWrites – true for ha, false is faster
copyOnRead – true to stop interactions between threads outside
of the cache
Cluster events – notification of partition and reconnection
NonStopCache - decorated cache favouring availability
UnlockedReadsView - decorated cache favouring speed
Management
Dynamic Configuration of common cache configs from JMX and
DevConsole
New web-based Monitoring with UI and API
www.terracotta.org 23
Thursday, 17 June 2010
28. Ehcache 2.0 Monitoring Options
JMX
• is built in to Ehcache but...
• JMX needs use portmap
• Slow
• Machines may be headless
Terracotta Dev Console (if using Terracotta)
Ehcache Console, new in 2.1
www.terracotta.org 24
Thursday, 17 June 2010
29. Including Web Services
Simple + Performant + Coherent + HA + Scaleable
Application
Ehcache Terracotta Terracotta
Server Server
Application
Ehcache Terracotta Terracotta
Server Server
Application
Ehcache Terracotta
Terracotta Server
Server
PHP App
C App
Web Container
REST/
Ehcache
C# App HTTP
Server
Ruby App
www.terracotta.org 25
Thursday, 17 June 2010
30. Terracotta Developer Console
Cache hit ratios
Hit/miss rates
Hits on the database
Cache puts
Detailed efficiency of cache
regions
Dramatically simplifies tuning
and operations, and shows the
database offload.
www.terracotta.org 26
Thursday, 17 June 2010
31. Ehcache Console
Web based
Configuration
Efficiency
Memory Use
Comes with supported versions
API to connect Operations
Monitoring
www.terracotta.org 27
Thursday, 17 June 2010
42. REST Performance
Ehcache Server Memcache
4000
3000
2000
1000
0
Get Multi-Get Put Remove
Source: MemcacheBench with Java clients. Time for 10,000 operations
www.terracotta.org 38
Thursday, 17 June 2010
43. Ehcache with Terracotta vs the Rest
Application
Tests done with Owners = 25K and 125K which translates
to total objects of 0.3 M and 1.5 M
Minimal tuning.
Cluster Configuration:
8 Client JVMs (1.75G Heap)
1 (+0) Terracotta Servers (6G Heap)
MySql: sales18.
www.terracotta.org 39
Thursday, 17 June 2010
44. Ehcache with Terracotta vs the Rest
Ehcache
Replicated with RMI not included because not coherent
Single TSA Server
15 threads and some with 100 threads
IMDG
15 threads
Cache deployed in Partitioned Mode
Tests were also done with Replicated – which did well for
small cache sizes but failed to complete with larger cache
sizes. So, it is not included.
memcached
15 threads
1 server
www.terracotta.org 40
Thursday, 17 June 2010
45. Hibernate - Read Only TPS
www.terracotta.org 41
Thursday, 17 June 2010
49. Test Source
The code behind the benchmarks is in the
Terracotta Community SVN repository.
Download https://svn.terracotta.org/repo/forge/
projects/ehcacheperf/
(Terracotta Community Login Required)
www.terracotta.org 45
Thursday, 17 June 2010
50. Performance Conclusions
With Hibernate, Using Spring Pet Clinic
After app servers and DBs tuned by
independent 3rd parties
30-95% database load reduction
80 times read-only performance of MySQL
Notably lower latency
1.5 ms versus 120 ms for database (25k)
www.terracotta.org 46
Thursday, 17 June 2010
51. What about NoSQL?
Ehcache + Terracotta configured with persistence gives you a NoSQL
store with limited features i.e. no search
TerraStore - new open source project from Sergio Bossa is a document
oriented NoSQL store based on Terracotta and ful
www.terracotta.org 47
Thursday, 17 June 2010