2. A PILOT PROJECT: ONLINE PRODUCT CATALOG
FOR A E-COMMERCE PLATFORM, MIGRATION TO
CASSANDRA
Previous version was based on In-Memory Data Grid Oracle
Coherence. All data from the primary storage (a relational database)
is cached in the data grid.
Goals of the migration:
• minimization of time required for system restart
• at least two copies of the data in different data-centers
• quick and simple backup
3. ARCHITECTURE IN A NUTSHELL
• Application server: all business logic + web-services
• stateless
• with local caches
• Data storage
• Oracle Coherence, then Cassandra via DataStax Java Driver
• Batch data loading based on Spring Batch
4. HOW A PRODUCT SHOULD LOOK LIKE TO MEET
THE REQUIREMENTS?
Some hypotheses:
• Data is on disk – available immediately after restart
• OS disk cache brings all the data to memory
• Key-value storage to simplify migration of the codebase
Nice to have:
• Simple deployment configuration as a plus
• Java-based solution as a plus
5. BASIC REQUIREMENTS / USE-CASES
• reads: ~5K TPS
• transactions can include more that one round-trip to the
storage, as well as more than one key in a query (“multi-gets”)
• ~50K TPS on side of the storage
• full data reload (once per 24 hours)
• partial update of values (e.g. of product attributes)
• availability 24x7
• millions of products
• tens of millions of related entities (product attributes etc.)
7. PERFORMANCE TESTING ENVIRONMENT
• Production-ready implementation
• 4 boxes (16 CPU, 24 GB) x 1 Cassandra instance
• 2 boxes x 2 app servers
• 100 GB of test data - fits in memory
• Main test is read queries:
• one hour
• up to 500 users
• even distribution of requested keys
8. WHAT DID HELP
•
configure your Cassandra cluster
• “OS swap off”
• different physical disks for different file-sets - e.g. data vs. commit log
• choose right (“private”) network interface
•
async queries for multi-gets + token-aware rouring on the app server side:
+15% TPS and latency
•
use last Cassandra version
• a good example: 1.2.6 => 1.2.8 – 15% TPS, latency 2x better
9. WHAT DID HELP
•
Use the key of a parent entity as a first component of the children keys:
PRIMARY KEY (parent-ID, child-ID)
• to minimize number of queries / disk seeks
• +15% TPS, latency 2x better
•
use local (“near”) caches on app server side: +15% TPS
• local EHCache
10. WHAT DID NOT HELP
•
Java GC monitoring on Cassandra boxes
• with recommended settings GC takes 7% maximum from overall time of
the tests
•
caching == ALL
• all data in OS disk cache
11. INTERESTING EXPERIMENTS
•
another implementation of the token-aware query routing
•
JSON or any other data format, if partial updates are not needed – a pure
key-value model
• allows to avoid creation of tombstones in the case of updates, if values
contain Cassandra collections
• another option is tuning of tombstone GC
12. SUMMARY
•
Cassandra is stable and mature enough product
•
Can compete with in-memory caches and data grids, at least if dataset is
small enough to be placed into memory
•
Actively developing. Has a large community. Good commercial support from
DataStax