SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
@
Andrew Santoro Development Manager
Peter Connolly Application Architect
Ivan Ukolov Architect/Implementation Lead April 2015
Agenda
• Background/Problem Statement
• Options
• Migration
• Data Models
• Performance
• Retrospectives
• Future plans
• Video
Background & Problem Statement
About Us
- Real-time RESTful data services application
- Serve product, inventory and store data to
website, mobile, store, and partner applications
- Not “Big Data” or Analytics
- Growth in business lead us to want:
- 10x growth in data
- Move from a read-mostly model to one which
could handle near-real-time updates
- Move into multiple data centers
Scaling Problems
Operational Problems
- Prewarming step was a prerequisite to satisfy
performance SLA
- Maintained second data grid for failover scenario
- hard to keep those consistent
- complicated release process
Performance Goals for 10x Data
Options Explored
Options Evaluated
- Denormalized Relational DB: DB2
- Document DB: MongoDB
- Columnar DB: Cassandra, Couchbase,
ActiveSpaces
- Graph: Neo4J
- Object: Versant
Options Short List
- MongoDB
- Feature-rich, JSON document DB
- Cassandra
- Scalable, true peer-to-peer
- ActiveSpaces
- TIBCO Proprietary
- Existing relationship with TIBCO
- In-memory key/value datagrid
POC Environment
- Amazon EC2
- 5 servers
- Same servers used for each test
- Had vendors assist with setup and execution
of benchmarks
- Modeled benchmarks after retail inventory use
cases
- C3.2xlarge instances
- Baseline of 148MM inventory records
POC Results - Summary
POC Results - Summary
- Cassandra &
ActiveSpaces
- Very
close
- MongoDB
- Failed
tests
YMMV!
Your mileage
may (will
probably) vary
POC Results - Initial Load
Operation TIBCO
ActiveSpaces
Apache
Cassandra
10Gen
MongoDB
Initial Load
~148MM Records
Same datacenter
(availability zone)
72,000 TPS (32
nodes)
34 min
98,000 TPS (40
nodes)
25 min
CPU: 62.4%
Memory: 83.0%
I/O, Disk 1: 36%
I/O, Disk 2: 35%
65,000 TPS (5 nodes)
38 min
CPU: 31%
Memory: 59%
I/O, Disk 1: 42%
I/O, Disk 2: 14%
20,000 TPS (5 nodes)
?? min
(Did not complete)
processed ~23MM
records
POC Results - Upsert (Writes)
Operation TIBCO
ActiveSpaces
Apache
Cassandra
10Gen
MongoDB
Upsert
(Writes)
ActiveSpaces sync
writes
vs.
Cassandra async
4,000 TPS
3.57 ms Avg Latency
CPU: 0.6%
Memory: 71%
I/O: 18% (disk 1)
I/O: 17% (disk 2)
4,000 TPS
3.2 ms Avg Latency
CPU: 3.7%
Memory: 77%
I/O: 0.3%
I/O: 2.2%
(Did not complete)
tests failed
POC Results - Read
Operation TIBCO
ActiveSpaces
Apache
Cassandra
10Gen
MongoDB
Read 400 TPS
2.54 ms Avg Latency
CPU: 0.06%
Memory: 62.4%
I/O: 0%
400 TPS
3.23 ms Avg Latency
CPU: 0.02%
Memory: 47%
I/O: 3.7%
(Did not complete)
tests failed
Migration Path
Past & Present
Phase 1 → 

Hybrid with Legacy
Phase 2 → 

Complete Switchover
Data Models
Data Model Requirements
- Model hierarchical domain objects
- Aggregate data to minimize reads
- No reads before writes
- Readable by 3rd party tools
(cqlsh, Dev Center)
Data Model based on Lists
id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+--------------
11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
Data Model based on Maps
id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId
----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+--------------------------
11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66:
33}
Collection Size Limits
- Native protocol v2
- collection max size is 64K
- collection element max size is 64K
- Native protocol v3
- the collection size and the length of each
element is 4 bytes long now
- still 64K limit for set element max size and
map key max size
https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec
Collections Internal Storage
CREATE TABLE collections (
my_key int PRIMARY KEY,
my_set set<int>,
my_map map<int, int>,
my_list list<int>,
);
INSERT INTO collections
(my_key, my_set, my_map, my_list)
VALUES ( 1, {1, 2}, {1:2, 3:4}, [1, 2]);
SELECT * FROM collections ;
my_key | my_list | my_map | my_set
-------------+----------+---------------+--------
1 | [1, 2] | {1: 2, 3: 4} | {1, 2}
Collections Internal Storage
RowKey: 1
=> (name=, value=, timestamp=1429162693574381)
=> (name=my_list:c646b7d0e3fa11e48d582f4261da0d90, value=00000001,
timestamp=1429162693574381)
=> (name=my_list:c646b7d1e3fa11e48d582f4261da0d90, value=00000002,
timestamp=1429162693574381)
=> (name=my_map:00000001, value=00000002, timestamp=1429162693574381)
=> (name=my_map:00000003, value=00000004, timestamp=1429162693574381)
=> (name=my_set:00000001, value=, timestamp=1429162693574381)
=> (name=my_set:00000002, value=, timestamp=1429162693574381)
Data Model based on Compound Key & JSON
CREATE TABLE product (
id int, -- product id

upcId int, -- 0 means product row,
-- otherwise upc id indicating it is upc row
object text, -- JSON object
review text, -- JSON object
PRIMARY KEY(id, upcId)
);
id | upcId | object | review
----+--------+------------------------------------------------------------------------------------------------------------+--------------------------------------
11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"}
11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null
11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null
Compound Key Internal Storage
CREATE TABLE compound_keys (
part_key int,
clust_key text,
data1 int,
data2 int,
PRIMARY KEY (part_key, clust_key)
);
INSERT INTO compound_keys
(part_key, clust_key, data1, data2)
VALUES ( 1, 'clust_key value 1', 1, 2);
INSERT INTO compound_keys
(part_key, clust_key)
VALUES ( 1, 'clust_key value 2');
part_key | clust_key | data1 | data2
--------------+-------------------------+---------+-------
1 | clust_key value 1 | 1 | 2
1 | clust_key value 2 | null | null
Compound Key Internal Storage
RowKey: 1
=> (name=clust_key value 1:, value=, timestamp=1429203000986594)
=> (name=clust_key value 1:data1, value=00000001,
timestamp=1429203000986594)
=> (name=clust_key value 1:data2, value=00000002,
timestamp=1429203000986594)
=> (name=clust_key value 2:, value=, timestamp=1429203000984518)
Data Load Performance Comparison
July 2013
Column Storage Overhead
- regular column size = column name + column
value + 15 bytes
- counter / expiring = +8 additional bytes
name : 2 bytes (length as short int) + byte[]
flags : 1 byte
timestamp : 8 bytes (long)
value : 4 bytes (length as int) + byte[]
http://btoddb-cass-storage.blogspot.ru/2011/07/column-overhead-and-sizing-every-column.html
https://issues.apache.org/jira/browse/CASSANDRA-4175
JSON vs primitive column types
- Significantly reduces storage overhead because
of better ratio of payload / storage metadata
- Improves throughput and latency
- Supports complex hierarchical structures
- But it loses in partial reads / updates
- Complicates schema versioning
Data Model for Secondary Indices
CREATE TABLE product_by_upc (
upcId int,
prodId int,
PRIMARY KEY (upcId)
);
Pagination
base_service_url?filter=active&_limit=100&_offset=last_prod_id
CREATE TABLE product_paging (
filter text,
prodId int,
PRIMARY KEY (filter, prodId)
);
SELECT prodId FROM product_paging

WHERE filter=’active’ AND prodId >last_prod_id limit 100
Pagination - SSTable Stats
Pagination - SSTable Stats after DELETE
Compaction Issue
##########################################################
##### nodetool compactionstats Status at 03-03-2015 14:00:01 #####
##########################################################
compaction type keyspace table completed total unit progress
Compaction catalog_v9 product_paging 110 196860609 bytes 0.00%
INFO [CompactionExecutor] 2015-04-07 02:36:44,564 CompactionController Compacting large row
catalog_v9/product_paging (2030276075 bytes) incrementally
Performance
Production Environment
- Datastax Enterprise 4.0.3 (Cassandra 2.0.7)
- migrating to DSE 4.6.5 (Cassandra 2.0.14) in
May
- 6 DSE Servers
- Server Specs
- Red Hat Enterprise Linux Server 6.4
- 8 Core Xeon HT w/196GB RAM
- 2 x 400GB SSD RAID-1 Mirrored
Cassandra Config
- NetworkTopologyStrategy with RF 3
- Murmur3Partitioner
- PropertyFileSnitch
- Virtual nodes
- Key cache 100MB
- Concurrent reads 32
- Concurrent writes 32
- 8GB Heap, 400MB Young Gen
- SizeTieredCompactionStrategy for bulk load data
- LeveledCompactionStrategy for real time updates
Client App Config
- Datastax Java Driver version 2.1.4
- CQL 3
- Write CL is LOCAL_QUORUM
- Read CL is ONE
Read Requests
Write Requests
Garbage Collection
CPU & Disk
Disk Performance
Disk Performance
- DataStax provided measurement app
- Hats off to Al Tobey & Patrick McFadin
- Ran it internally
- Compared it against other vendors
- Result:
- SAN latency is significantly slower!
- Don’t be fooled by claims of high IOPS
- Action:
- Use locally-mounted SSD storage!
Disk Performance
http://tobert.org/disk-latency-graphs/
Retrospectives
What Worked Well
- Documentation is good
- CQL3 easy to learn and migrate to from SQL
background
- Query tracing is helpful
- Stable: Haven’t lost a node in production
- Able to reduce our reliance on caching in app tier
- Cassandra Cluster Manager (https://github.com/
pcmanus/ccm)
Problems encountered with Cassandra
- CQL queries brought down cluster
- Delete and creating a keyspace ….
- Lengthy compaction
- Need to understand underlying storage
- OpsCenter Performance Charts
Our own problems
- Small cluster all servers must perform well
- Lack of versioning of JSON schema
- How to handle performant exports
- 5% CPU -> 30%
- Non-primary key access
- Under-allocated test environments
One concern...
Secret Identity???
Patrick McFadin Buddy Pine/”Syndrome”
Future Plans
Future Plans
- Upgrade DSE 4.0.3 → 4.6.5
- Cassandra 2.0.7 → 2.0.14
- DSE Spark Integration for reporting
- Multi-Datacenter
- Using Mutagen for schema management
- JSON schema versioning
- RxJava for asynchronous operations
- Spring Data Cassandra
Stuff we’d like to see in DSE
- DSE Kibana
- Augment Spark’s query capability with an
interactive GUI
- Replication listeners
- Detect when replication is complete
Video
We’re hiring ...
ecommerce.macysjobs.com
3rd & Folsom
Questions?

Mais conteúdo relacionado

Mais procurados

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 

Mais procurados (20)

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 

Destaque

Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...Beatriz Martín @zigiella
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
The Upstream Game, 2hr version
The Upstream Game, 2hr versionThe Upstream Game, 2hr version
The Upstream Game, 2hr versionSean Roberts
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...DataStax
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Target: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data ModelingTarget: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data ModelingDataStax Academy
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraDataStax Academy
 

Destaque (20)

Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Lear...
 
Cassandra
CassandraCassandra
Cassandra
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
The Upstream Game, 2hr version
The Upstream Game, 2hr versionThe Upstream Game, 2hr version
The Upstream Game, 2hr version
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
Taller de Text Mining en Twitter con R
Taller de Text Mining en Twitter con RTaller de Text Mining en Twitter con R
Taller de Text Mining en Twitter con R
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Target: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data ModelingTarget: Escaping Disco-Era Data Modeling
Target: Escaping Disco-Era Data Modeling
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
 

Semelhante a Apache Cassandra at Macys

MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines SlidesMySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines SlidesSeveralnines
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorMasahiko Sawada
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellEmily Ikuta
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valkhvdvalk
 
Modernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerModernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerMicrosoft Tech Community
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBSeveralnines
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
MySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User ConferenceMySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User ConferenceSeveralnines
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 

Semelhante a Apache Cassandra at Macys (20)

MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines SlidesMySQL Cluster 7.3 Performance Tuning - Severalnines Slides
MySQL Cluster 7.3 Performance Tuning - Severalnines Slides
 
Handout3o
Handout3oHandout3o
Handout3o
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
Modernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerModernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL Server
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
MySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User ConferenceMySQL Cluster Performance Tuning - 2013 MySQL User Conference
MySQL Cluster Performance Tuning - 2013 MySQL User Conference
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDsHigh-Load Storage of Users’ Actions with ScyllaDB and HDDs
High-Load Storage of Users’ Actions with ScyllaDB and HDDs
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Apache Cassandra at Macys

  • 1. @ Andrew Santoro Development Manager Peter Connolly Application Architect Ivan Ukolov Architect/Implementation Lead April 2015
  • 2. Agenda • Background/Problem Statement • Options • Migration • Data Models • Performance • Retrospectives • Future plans • Video
  • 4. About Us - Real-time RESTful data services application - Serve product, inventory and store data to website, mobile, store, and partner applications - Not “Big Data” or Analytics - Growth in business lead us to want: - 10x growth in data - Move from a read-mostly model to one which could handle near-real-time updates - Move into multiple data centers
  • 6. Operational Problems - Prewarming step was a prerequisite to satisfy performance SLA - Maintained second data grid for failover scenario - hard to keep those consistent - complicated release process
  • 9. Options Evaluated - Denormalized Relational DB: DB2 - Document DB: MongoDB - Columnar DB: Cassandra, Couchbase, ActiveSpaces - Graph: Neo4J - Object: Versant
  • 10. Options Short List - MongoDB - Feature-rich, JSON document DB - Cassandra - Scalable, true peer-to-peer - ActiveSpaces - TIBCO Proprietary - Existing relationship with TIBCO - In-memory key/value datagrid
  • 11. POC Environment - Amazon EC2 - 5 servers - Same servers used for each test - Had vendors assist with setup and execution of benchmarks - Modeled benchmarks after retail inventory use cases - C3.2xlarge instances - Baseline of 148MM inventory records
  • 12. POC Results - Summary
  • 13. POC Results - Summary - Cassandra & ActiveSpaces - Very close - MongoDB - Failed tests YMMV! Your mileage may (will probably) vary
  • 14. POC Results - Initial Load Operation TIBCO ActiveSpaces Apache Cassandra 10Gen MongoDB Initial Load ~148MM Records Same datacenter (availability zone) 72,000 TPS (32 nodes) 34 min 98,000 TPS (40 nodes) 25 min CPU: 62.4% Memory: 83.0% I/O, Disk 1: 36% I/O, Disk 2: 35% 65,000 TPS (5 nodes) 38 min CPU: 31% Memory: 59% I/O, Disk 1: 42% I/O, Disk 2: 14% 20,000 TPS (5 nodes) ?? min (Did not complete) processed ~23MM records
  • 15. POC Results - Upsert (Writes) Operation TIBCO ActiveSpaces Apache Cassandra 10Gen MongoDB Upsert (Writes) ActiveSpaces sync writes vs. Cassandra async 4,000 TPS 3.57 ms Avg Latency CPU: 0.6% Memory: 71% I/O: 18% (disk 1) I/O: 17% (disk 2) 4,000 TPS 3.2 ms Avg Latency CPU: 3.7% Memory: 77% I/O: 0.3% I/O: 2.2% (Did not complete) tests failed
  • 16. POC Results - Read Operation TIBCO ActiveSpaces Apache Cassandra 10Gen MongoDB Read 400 TPS 2.54 ms Avg Latency CPU: 0.06% Memory: 62.4% I/O: 0% 400 TPS 3.23 ms Avg Latency CPU: 0.02% Memory: 47% I/O: 3.7% (Did not complete) tests failed
  • 18. Past & Present Phase 1 → 
 Hybrid with Legacy Phase 2 → 
 Complete Switchover
  • 20. Data Model Requirements - Model hierarchical domain objects - Aggregate data to minimize reads - No reads before writes - Readable by 3rd party tools (cqlsh, Dev Center)
  • 21. Data Model based on Lists id | name | upcId | upcColor | upcAttrId | upcAttrName | upcAttrValue | upcAttrRefId ----+----------------+-----------+---------------------+-----------------+---------------------------------------------------+---------------------+-------------- 11 | Nike Pants | [22, 33] | ['White', 'Red'] | [44, 55, 66] | ['ACTIVE', 'PROMOTION', 'ACTIVE'] | ['Y', 'Y', 'N'] | [22, 22, 33]
  • 22. Data Model based on Maps id | name | upcColor | upcAttrName | upcAttrValue | upcAttrRefId ----+----------------+--------------------------------+-------------------------------------------------------------------+--------------------------------+-------------------------- 11 | Nike Pants | {22: 'White', 33: 'Red'} | {44: 'ACTIVE', 55: 'PROMOTION', 66: 'ACTIVE'} | {44: 'Y', 55: 'Y', 66: 'N'} | {44: 22, 55: 22, 66: 33}
  • 23. Collection Size Limits - Native protocol v2 - collection max size is 64K - collection element max size is 64K - Native protocol v3 - the collection size and the length of each element is 4 bytes long now - still 64K limit for set element max size and map key max size https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec
  • 24. Collections Internal Storage CREATE TABLE collections ( my_key int PRIMARY KEY, my_set set<int>, my_map map<int, int>, my_list list<int>, ); INSERT INTO collections (my_key, my_set, my_map, my_list) VALUES ( 1, {1, 2}, {1:2, 3:4}, [1, 2]); SELECT * FROM collections ; my_key | my_list | my_map | my_set -------------+----------+---------------+-------- 1 | [1, 2] | {1: 2, 3: 4} | {1, 2}
  • 25. Collections Internal Storage RowKey: 1 => (name=, value=, timestamp=1429162693574381) => (name=my_list:c646b7d0e3fa11e48d582f4261da0d90, value=00000001, timestamp=1429162693574381) => (name=my_list:c646b7d1e3fa11e48d582f4261da0d90, value=00000002, timestamp=1429162693574381) => (name=my_map:00000001, value=00000002, timestamp=1429162693574381) => (name=my_map:00000003, value=00000004, timestamp=1429162693574381) => (name=my_set:00000001, value=, timestamp=1429162693574381) => (name=my_set:00000002, value=, timestamp=1429162693574381)
  • 26. Data Model based on Compound Key & JSON CREATE TABLE product ( id int, -- product id
 upcId int, -- 0 means product row, -- otherwise upc id indicating it is upc row object text, -- JSON object review text, -- JSON object PRIMARY KEY(id, upcId) ); id | upcId | object | review ----+--------+------------------------------------------------------------------------------------------------------------+-------------------------------------- 11 | 0 | {"id" : "11", "name" : "Nike Pants"} | {"avgRating" : "5", "count" : "567"} 11 | 22 | {"color" : "White", "attr" : [{"id": "44", "name": "ACTIVE", "value": "Y"}, ...]} | null 11 | 33 | {"color" : "Red", "attr" : [{"id": "66", "name": "ACTIVE", "value": "N"}]} | null
  • 27. Compound Key Internal Storage CREATE TABLE compound_keys ( part_key int, clust_key text, data1 int, data2 int, PRIMARY KEY (part_key, clust_key) ); INSERT INTO compound_keys (part_key, clust_key, data1, data2) VALUES ( 1, 'clust_key value 1', 1, 2); INSERT INTO compound_keys (part_key, clust_key) VALUES ( 1, 'clust_key value 2'); part_key | clust_key | data1 | data2 --------------+-------------------------+---------+------- 1 | clust_key value 1 | 1 | 2 1 | clust_key value 2 | null | null
  • 28. Compound Key Internal Storage RowKey: 1 => (name=clust_key value 1:, value=, timestamp=1429203000986594) => (name=clust_key value 1:data1, value=00000001, timestamp=1429203000986594) => (name=clust_key value 1:data2, value=00000002, timestamp=1429203000986594) => (name=clust_key value 2:, value=, timestamp=1429203000984518)
  • 29. Data Load Performance Comparison July 2013
  • 30. Column Storage Overhead - regular column size = column name + column value + 15 bytes - counter / expiring = +8 additional bytes name : 2 bytes (length as short int) + byte[] flags : 1 byte timestamp : 8 bytes (long) value : 4 bytes (length as int) + byte[] http://btoddb-cass-storage.blogspot.ru/2011/07/column-overhead-and-sizing-every-column.html https://issues.apache.org/jira/browse/CASSANDRA-4175
  • 31. JSON vs primitive column types - Significantly reduces storage overhead because of better ratio of payload / storage metadata - Improves throughput and latency - Supports complex hierarchical structures - But it loses in partial reads / updates - Complicates schema versioning
  • 32. Data Model for Secondary Indices CREATE TABLE product_by_upc ( upcId int, prodId int, PRIMARY KEY (upcId) );
  • 33. Pagination base_service_url?filter=active&_limit=100&_offset=last_prod_id CREATE TABLE product_paging ( filter text, prodId int, PRIMARY KEY (filter, prodId) ); SELECT prodId FROM product_paging
 WHERE filter=’active’ AND prodId >last_prod_id limit 100
  • 35. Pagination - SSTable Stats after DELETE
  • 36. Compaction Issue ########################################################## ##### nodetool compactionstats Status at 03-03-2015 14:00:01 ##### ########################################################## compaction type keyspace table completed total unit progress Compaction catalog_v9 product_paging 110 196860609 bytes 0.00% INFO [CompactionExecutor] 2015-04-07 02:36:44,564 CompactionController Compacting large row catalog_v9/product_paging (2030276075 bytes) incrementally
  • 38. Production Environment - Datastax Enterprise 4.0.3 (Cassandra 2.0.7) - migrating to DSE 4.6.5 (Cassandra 2.0.14) in May - 6 DSE Servers - Server Specs - Red Hat Enterprise Linux Server 6.4 - 8 Core Xeon HT w/196GB RAM - 2 x 400GB SSD RAID-1 Mirrored
  • 39. Cassandra Config - NetworkTopologyStrategy with RF 3 - Murmur3Partitioner - PropertyFileSnitch - Virtual nodes - Key cache 100MB - Concurrent reads 32 - Concurrent writes 32 - 8GB Heap, 400MB Young Gen - SizeTieredCompactionStrategy for bulk load data - LeveledCompactionStrategy for real time updates
  • 40. Client App Config - Datastax Java Driver version 2.1.4 - CQL 3 - Write CL is LOCAL_QUORUM - Read CL is ONE
  • 46. Disk Performance - DataStax provided measurement app - Hats off to Al Tobey & Patrick McFadin - Ran it internally - Compared it against other vendors - Result: - SAN latency is significantly slower! - Don’t be fooled by claims of high IOPS - Action: - Use locally-mounted SSD storage!
  • 49. What Worked Well - Documentation is good - CQL3 easy to learn and migrate to from SQL background - Query tracing is helpful - Stable: Haven’t lost a node in production - Able to reduce our reliance on caching in app tier - Cassandra Cluster Manager (https://github.com/ pcmanus/ccm)
  • 50. Problems encountered with Cassandra - CQL queries brought down cluster - Delete and creating a keyspace …. - Lengthy compaction - Need to understand underlying storage - OpsCenter Performance Charts
  • 51. Our own problems - Small cluster all servers must perform well - Lack of versioning of JSON schema - How to handle performant exports - 5% CPU -> 30% - Non-primary key access - Under-allocated test environments
  • 52. One concern... Secret Identity??? Patrick McFadin Buddy Pine/”Syndrome”
  • 54. Future Plans - Upgrade DSE 4.0.3 → 4.6.5 - Cassandra 2.0.7 → 2.0.14 - DSE Spark Integration for reporting - Multi-Datacenter - Using Mutagen for schema management - JSON schema versioning - RxJava for asynchronous operations - Spring Data Cassandra
  • 55. Stuff we’d like to see in DSE - DSE Kibana - Augment Spark’s query capability with an interactive GUI - Replication listeners - Detect when replication is complete
  • 56. Video