2. Cassandra @walmartlabs
• Cassandra adoption at Walmart
– Using the DataStax distribution http://www.datastax.com/
• Introduction to the talks
• Hiring @labs
Walmart eCommerce 2
3. Cassandra @walmartlabs
• Introduction to the talks
– Walmartlabs
• @labs – Using Cassandra for real-time stream processing
• @services – Using Cassandra for product and items
– DataStax
• Data modeling with Cassandra
Walmart eCommerce 3
6. Data-stream computation
• “Big” data: MapReduce (Hadoop)
– Map and Reduce steps
– Batch process large input (e.g., from HDFS)
– Hadoop distributes computation
• Fast data: MapUpdate (Muppet)
– Map and Update steps
– Continuously process streaming input
– Muppet maintains computation
– Muppet manages memory/storage
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
7. The MapReduce framework (Hadoop)
• Event
– A <key, value> pair of data
• Map
– A function that performs (stateless) computation on incoming
events
• Reduce
– A function that combines all input for a particular key
• Application
– Map -> Reduce
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
8. The MapUpdate framework (Muppet)
• Event
– A <key, value> pair of data
• Map
– A function that performs (stateless) computation on incoming
events
• Update
– A function that updates a slate using incoming events
• Application
– A directed graph of Mappers and Updaters
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
10. The Map (Foursquare::CheckinMapper)
sub map {
my $self = shift;
my $event = shift;
my $checkin = $event->{checkin};
my $timeslot = int($checkin->{created} / 900) * 900;
$event->{kosmix}->{timeslot} = $timeslot;
$event->{kosmix}->{interval} = 900;
my $venue_name = $checkin->{venue}->{name};
my $retailer = 0;
$retailer = 'ToysRUs' if ($venue_name =~ /toys.*r.*us/i);
$retailer = 'Walmart' if ($venue_name =~ /wal.*mart/i);
$retailer = 'SamsClub' if ($venue_name =~ /sam.*club/i);
if ($retailer) {
$event->{kosmix}->{retailer} = $retailer;
$self->publish("FoursquareRetailerCheckin", $event,
$retailer.".".$timeslot);
2012 ISD YBM Tech Fair - Big Fast Data @WalmartLabs
11. The Update (Foursquare::RetailerUpdater)
use Muppet::Updater;
package Foursquare::RetailerUpdater;
@ISA = qw( Muppet::Updater );
use strict;
sub update {
my $self = shift;
my $event = shift;
my $slate = shift;
my $config = shift;
my $key = shift;
$slate->{timeslot} = $event->{kosmix}->{timeslot};
$slate->{interval} = $event->{kosmix}->{interval};
$slate->{retailer} = $event->{kosmix}->{retailer};
$slate->{count} += 1;
2012 ISD YBM Tech Fair - Big Fast Data @WalmartLabs
13. Muppet Processing
• Slates are 1 – 100KB in size
• Local cache on Muppet Node
– 85% reads from cache
– Write-though delayed cache
– ~750K slates in cache per node
• Remote slates read through Muppet API
• Cassandra is the permanent datastore
• Slates tend to be updated and read in batches
– 10-50 at a time
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
15. Datastore Requirements
• Consistent, low response time
– 10ms or less for slate reads on average
• 1+ billion keys, future expansion maybe 5-10 billion
• Value is whole set of data
– Slate losses in small amounts OK
• Datastore gets entirely “cold” reads
– Muppet Cache: 85% for reads
– Datastore cannot rely on cache for performance
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
16. Why Cassandra?
• Timeframe: Early 2010
– Low latency: a rare feature among NoSQL
– Most NoSQL favors throughput over response time
– New “Best NoSQL evur!!” every 2 months
• Cassandra:
– Open-Source, active community, Clustering a core feature
• Simple is good
– Peer networking, Data file format, key distribution
• QUORUM consistency good middle ground
– AP focus in CAP aligns well with our needs
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
17. Why Cassandra – the Challenges
• Seeks are going to be difficult
– Overwrites mean nightly compactions
– Compactions blow up seek performance
– 90%+ cold reads means lots of seeks
– Head and body reads can produce a lot of seeks
• Slates as an atomic unit means no bulk column slice reads
• Likely to have unfavorable read:write ratio
– Early estimates: 1:3, or even worse
• Oh yeah, spinning disks hate seeks. Uh oh!
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
18. Frequent Row Overwrites in Cassandra
TAIL Few Seeks
Full Compaction
BODY Some Seeks
HEAD Many Seeks
Growth During Day
Data Files (SS Tables)
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
19. Solution
• Cassandra + SSDs !!
• Expensive in terms of space, cheap in terms of IOps
• Random seeks “free”
• Good performance during nightly compactions
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
20. Compaction Effect on System
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
21. How did Cassandra do?
• Average latency below 10ms, often 5-8ms
• read-write ratio: 1:2
– Today, 1:1
• Compacting 500GB every night in <4 hours
• Individual C* nodes handled over 1500 rps/wps
• SSD cost: well worth it
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
22. Helping Cassandra out
• Muppet absorbs writes in local cache
– Write on # of updates or staleness
– Reduces write counts in Cassandra
– More efficient
• Compress all slates on Muppet nodes
– Easier to scale than C* nodes doing compression
– Less disk IO, less network
– CPU on Muppet nodes cheap
• Expire data via TTL
– Muppet apps decide data-keep length
• Java GC tuning flattened out CPU and GC stops
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
23. Recent and Future
• Cassandra 0.8.x
– Faster compaction
– Stability
– Performance
• Cassandra 1.0.x
– Close to deployment @WML
– LevelDB is very, very interesting
– Cache memory changes make large caches feasible!
– Row[Column] latest-only: very nice
– SSDs no longer needed? Possibly!
• Depends on cold seek requirements
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
24. Lessons
• Simple is usually faster and cheaper
– Add complexity only where needed
• Best solution can usually be made to work
• Proactive monitoring very important
– Trend graph everything relevant!
• Failing fast is better than succeeding late
• No substitute for understanding your platform
• Spend money when it will save you time and complexity
2012 Cassandra for Real-Time Stream Processing @WalmartLabs
30. Flexible Categorization & Attribution
• The right kind of categorization and
attribution is crucial to making sense of the
enormity of product data
• Ultimate shopping experience
• Fine-grained analytics & planning
• Standards exist, but severely limiting
• Product landscape changes dramatically
every day
31. Other excerpts from the “shopping list”
• Lookup and potentially match products
and offerings by any combination of
attributes and other dimensional
criteria
• Item-Item Relationships & Collections
• Hierarchical
• Graph
• Low Latency, High Throughput,
Highly Available
• A scalable but unified system of record
for all product and offering data
32. Translating to Cassandra
• Modeling options
1. Product as a “wide row” encompassing all
offerings
2. Product assembled from several offering
“fragment” rows
• Multiple Column Families
• Product fragments
• Custom consistency enabler
• Custom row caching at column family level
• Single keyspace to hold all core data fragments
• Tighter control of replication factor, strategy
• Additional keyspaces only for supporting data
33. Translating to Cassandra (contd.)
• Flexible, selective denormalization
• Secondary indexes for faster attribute-level queries
• Dynamic composites
• define flexible comparators for different column key levels
• capture 1-n levels of dimension intersections
• Column slicing to retrieve the right offerings
34. The “Supporting Cast”
• Solr for additional indexing querying capabilities
• Mainly for attribute values
• Pattern matching
• Non-standard type comparisons
• Range checks
Products are inherently multi-dimensional and mostly multi-variantDimensions includeBusiness Unit (Walmart, Sam’s Club, ASDA etc.)Geography (US, Canada, UK etc.)Language (en_US, fr_CA, en_UK etc.) Supply Chain (Owned Inventory, Direct Ship, Marketplace etc.) Channel (Website, Retail/Store, Mobile, Facebook etc.)Variants includeSize (S, M, L, XL etc.)Color (Red, Green, Blue etc.)Capacity (8 GB, 16GB etc.)A true Global Product content is typically agnostic of any specific dimensions or variants Items as we know and see them are actually Product OfferingsRepresenting content and behavior changes captured at every dimension and variant intersectionWhat you shop for is different from what you order is different from what you actually get!
Notice the need for the concept of dimensions and variants to capture and maintain data at each levelIngest external catalogs, even if we do not plan to sell it rightawayOn a scale of 100’s of millions of unique SKU’sBase – Variant and pre-configured bundles create order of magnitude increases in these estimates
How do we give our customers access to the largest assortment in the world?As the digital arm of the worlds largest retailer, we need to not only give existing customers access to an endless shelf, but we also need to have a broad assortment to expand into the consideration set of retailer non-walmart shoppers… this means millions and millions of items. And, we do so in a manner that is scalable and gives the consumer the right product information to make an informed decision about whether or not the product will meet their needs.
Ultimate shopping experience Customer finds everything that he/she needs intuitively and in the right place, whether browsing or searchingFine-grained analytics & planning Fine-grained analytics helps us put the right kind of products on our shelves (physical or virtual) at the right level of availability (inventory) and pricingStandards exist, but severely limiting e.g. GPC hierarchical classification and attribution structureProducts landscape changes dramatically every day e.g. Tablets, a radically new form factor, unleashes itself on the market, we want to be able to adopt it and sell it ASAP and not wait for a cumbersome change control process due to inflexible categorization and attribution
Ability to lookup and potentially match products and offerings/items by any combination of attributes and other dimensional criteriaItem-Item Relationships & CollectionsHierarchicalBase-variants (e.g. iPhone 4S 16/32/64 GB)GraphBundlesHard, Fixed, Inflexible or ConfigurableComponents & IngredientsAccessories & ReplacementsCase Packs & Vendor PacksLow Latency, High Throughput, Highly AvailableSellers typically update 40-50% of their offerings at some level each dayBased on global projections, this may be comparable to the scale of social media feedsAccept, process, search, retrieve and analyze large volumes of data 24x7
Multiple Column FamiliesProduct fragmentsCustom consistency enablerSeparate the “data” from the “index” or “event log”Use to separate “Work In Progress” from golden copyImplicit versioning and potential archiving/purging requirementsTunable consistency levels per API call (Read/Write)Custom row caching at column family levelOptimize for read-intensivevs. write-intensive column familiesSingle keyspace to hold all data fragmentsTighter control of replication factor (DC + 3 or 5), strategy (NetworkTopologyStrategy (formerly known as Datacenter-ShardStrategy))Additional keyspaces only for supporting dataLower priority, loosely coupled or completely decoupledE.g. Purgeable audit & history logs
Flexible, selective denormalizationBi-directional relationshipsCapture more than just foreign keysIndicesMerge records to create product offering in the application/DaaS layerRight balance of optimization of the retrieval algorithm vs, spaceSecondary indexes for faster attribute-level queries, but simple queries onlyHowever, complex queries may need to be supplemented with other tools as we will see later Dynamic composites capture 1-n levels of dimension intersections define flexible comparators for different column key levelsColumn slicing to retrieve the right offerings (i.e. intersections)No need to use Order Preserving PartitionerCategorization and structure is completely handled outside of the data storeCassandra only used to capture attribute values
Solr for additional indexing querying capabilitiesMainly at attribute value levelPattern matchingNon-standard comparisons and range checks HDFS/Hadoop for “extreme” bulk/batch operationsLarge File/content streaming and parallel processingCorresponding response aggregationHadoop “append”