SlideShare a Scribd company logo
1 of 19
Download to read offline
Using Cassandra In Building A Reporting Platform
Javed Roshan – Director, Data Services
Mukaram Aziz – Sr. Manager, Data Services
1 Use Case
2 New Data Platform
3 Design Decisions
4 Solution Stack
5 Challenges
2© 2015. All Rights Reserved.
Use Case
•  Fast Data requirements in an Operational Space
–  Metrics and Reports for intra-day business decisions
–  Process Monitoring
•  Current Landscape
–  Multiple data sources
–  Traditional batched ETL
–  Multiple data destinations
–  Reporting Tools
•  Opportunity Areas
–  Make reports near real time
–  Achieve 99.99% SLAs
–  Time to market delivery
–  Make enhancements inexpensive
3© 2015. All Rights Reserved.
Existing
Data Sources RDBMS, Files
ETL File Based
Data Distribution Files
Data Destination RDBMS
Reporting Tools Various
New Data Platform
•  Platform
–  Data Distribution: Kafka
–  Data Processing: Go / Docker
–  Data Store: Cassandra
–  AWS
•  Design Decisions
–  Move data when available
–  Transform when all data available
•  Cassandra
–  CAP: Emphasis on A & P with tunable C
–  Wide row tables
–  Linear scalability to handle large data sizes
–  Out of the box multi-DC deployment
4© 2015. All Rights Reserved.
Existing New
Data Sources RDBMS, Files RDBMS, Files
ETL File based Go / Docker
Data Distribution Files Kafka
Data Destination RDBMS Cassandra
Reporting Tools Various Streamlined
Design Decisions
•  Data Modeling
–  Partition Key/Size
–  “Read” Response time
–  Handling Consistency
–  Collection Columns: Sets & Maps
–  Logical separation of raw & processed data
–  All lookup data in a single table
•  Indexes
–  Primary, Inverted, Secondary, DSE Search Indexes
•  DSE Search
–  range-queries
–  regular-expression
–  non-equality
–  faceted
5© 2015. All Rights Reserved.
Design Decisions
•  Consistency
–  W Consistency + R Consistency > Replication Factor
•  Indexes
6© 2015. All Rights Reserved.
Data Access Options
Data /
Index
Storage
Response
Time
Maintain Cardinality Search Consistency
Primary Key Data High V Fast App High Limited Tunable
Duplicate Data (Primary Key) Data High V Fast App High Limited Tunable
Inverted Index Index Low Fast App High Limited Tunable
Secondary Index Index Low Medium System Low Limited Tunable
DSE Search Index Medium
Slow
(relative)
System Any Versatile One (R)
Benchmarking: Indexes
0.02 0.04
1.6
0.6
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Primary Index Inverted Index DSE Search Secondary Index
7© 2015. All Rights Reserved.
Timeinseconds
Index Type
•  22.6 million rows
•  6 node cluster
Performance
•  3 Replication Factor
•  Write Heavy
–  Increased concurrent writes to 64 (from 32)
–  Decreased concurrent reads to 16 (from 32)
–  Size-tiered compaction strategy
•  Cassandra cluster with DSE Search enabled on all nodes
•  Virtual nodes set to 16
•  All caches disabled except filter cache
•  EC2 Snitch on AWS – 3 AZs
•  DSE Search soft auto-commit max time to 10s
8© 2015. All Rights Reserved.
Solution Stack
9© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack: Plug-In Framework
10© 2015. All Rights Reserved.
•  Go Service: Plugins chained in a single process
•  Packaged & deployed in a Docker Container
•  Bootstrapped from a config
•  100% developed in-house
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
GO	
  SERVICE
Solution Stack
11© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
12© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
13© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
14© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
15© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
16© 2015. All Rights Reserved.
•  Cassandra
•  Data Storage
•  Go-Based Plugin Framework
•  Go services for data Ingestion & Processing
•  Docker
•  Packaging and deployment
•  Mesosphere
•  Single view of infrastructure
•  Marathon
•  Launch containers
•  Kafka
•  Data transfer and distribution
•  Consul
•  Service discovery and configuration management
•  Jenkins
•  Continuous Integration
Solution Stack
Benchmarking: Data Processing
17© 2015. All Rights Reserved.
•  Test for a functional group
•  Cassandra: 6 node cluster
•  Kafka: 6 node cluster
•  Go Services: 3
•  Primary Data Source: Oracle
•  Time: 360 minutes
•  Data Size: 1 year
Description Measure
Total rows processed 450 million
De-normalized rows 11.8 million
Rate of processing (Go Services) ~300k tps
Rate of processing (Platform) ~21k tps
% time waiting on data ingestion 75%
Challenges
•  Not all query patterns are known in advance
•  Index rebuilds are costly
•  Business adjusting to near real-time data
•  Operational support adjustments
•  Backup/Restore
•  Finding Talent – We are hiring!
18© 2015. All Rights Reserved.
Thank you

More Related Content

What's hot

Fantastic Red Team Attacks and How to Find Them
Fantastic Red Team Attacks and How to Find ThemFantastic Red Team Attacks and How to Find Them
Fantastic Red Team Attacks and How to Find Them
Ross Wolf
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
[cb22] ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
[cb22]  ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛[cb22]  ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
[cb22] ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
CODE BLUE
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 

What's hot (20)

Hunting Lateral Movement in Windows Infrastructure
Hunting Lateral Movement in Windows InfrastructureHunting Lateral Movement in Windows Infrastructure
Hunting Lateral Movement in Windows Infrastructure
 
How to Plan Purple Team Exercises
How to Plan Purple Team ExercisesHow to Plan Purple Team Exercises
How to Plan Purple Team Exercises
 
Purple team is awesome
Purple team is awesomePurple team is awesome
Purple team is awesome
 
Fantastic Red Team Attacks and How to Find Them
Fantastic Red Team Attacks and How to Find ThemFantastic Red Team Attacks and How to Find Them
Fantastic Red Team Attacks and How to Find Them
 
Threat hunting in cyber world
Threat hunting in cyber worldThreat hunting in cyber world
Threat hunting in cyber world
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
 
VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity ...
VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity ...VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity ...
VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity ...
 
Cognitive security: all the other things
Cognitive security: all the other thingsCognitive security: all the other things
Cognitive security: all the other things
 
Building a Successful Internal Adversarial Simulation Team - Chris Gates & Ch...
Building a Successful Internal Adversarial Simulation Team - Chris Gates & Ch...Building a Successful Internal Adversarial Simulation Team - Chris Gates & Ch...
Building a Successful Internal Adversarial Simulation Team - Chris Gates & Ch...
 
[cb22] ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
[cb22]  ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛[cb22]  ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
[cb22] ブロックチェーンにC&Cサーバー情報を隠ぺいした攻撃者との直接対峙により得られたもの by 谷口 剛
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Threat hunting on the wire
Threat hunting on the wireThreat hunting on the wire
Threat hunting on the wire
 
Hacking Lab con ProxMox e Metasploitable
Hacking Lab con ProxMox e MetasploitableHacking Lab con ProxMox e Metasploitable
Hacking Lab con ProxMox e Metasploitable
 
Is Rust Programming ready for embedded development?
Is Rust Programming ready for embedded development?Is Rust Programming ready for embedded development?
Is Rust Programming ready for embedded development?
 
CSW2022_01_introduction.pptx.pdf
CSW2022_01_introduction.pptx.pdfCSW2022_01_introduction.pptx.pdf
CSW2022_01_introduction.pptx.pdf
 
Mapping to MITRE ATT&CK: Enhancing Operations Through the Tracking of Interac...
Mapping to MITRE ATT&CK: Enhancing Operations Through the Tracking of Interac...Mapping to MITRE ATT&CK: Enhancing Operations Through the Tracking of Interac...
Mapping to MITRE ATT&CK: Enhancing Operations Through the Tracking of Interac...
 
1891件以上のカーネルの不具合修正に貢献した再現用プログラムを自動生成するsyzkallerのテスト自動化技術(NTT Tech Conference ...
1891件以上のカーネルの不具合修正に貢献した再現用プログラムを自動生成するsyzkallerのテスト自動化技術(NTT Tech Conference ...1891件以上のカーネルの不具合修正に貢献した再現用プログラムを自動生成するsyzkallerのテスト自動化技術(NTT Tech Conference ...
1891件以上のカーネルの不具合修正に貢献した再現用プログラムを自動生成するsyzkallerのテスト自動化技術(NTT Tech Conference ...
 
詳説WebAssembly
詳説WebAssembly詳説WebAssembly
詳説WebAssembly
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 

Similar to Capital One: Using Cassandra In Building A Reporting Platform

Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
Harry Frost
 

Similar to Capital One: Using Cassandra In Building A Reporting Platform (20)

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Capital One: Using Cassandra In Building A Reporting Platform

  • 1. Using Cassandra In Building A Reporting Platform Javed Roshan – Director, Data Services Mukaram Aziz – Sr. Manager, Data Services
  • 2. 1 Use Case 2 New Data Platform 3 Design Decisions 4 Solution Stack 5 Challenges 2© 2015. All Rights Reserved.
  • 3. Use Case •  Fast Data requirements in an Operational Space –  Metrics and Reports for intra-day business decisions –  Process Monitoring •  Current Landscape –  Multiple data sources –  Traditional batched ETL –  Multiple data destinations –  Reporting Tools •  Opportunity Areas –  Make reports near real time –  Achieve 99.99% SLAs –  Time to market delivery –  Make enhancements inexpensive 3© 2015. All Rights Reserved. Existing Data Sources RDBMS, Files ETL File Based Data Distribution Files Data Destination RDBMS Reporting Tools Various
  • 4. New Data Platform •  Platform –  Data Distribution: Kafka –  Data Processing: Go / Docker –  Data Store: Cassandra –  AWS •  Design Decisions –  Move data when available –  Transform when all data available •  Cassandra –  CAP: Emphasis on A & P with tunable C –  Wide row tables –  Linear scalability to handle large data sizes –  Out of the box multi-DC deployment 4© 2015. All Rights Reserved. Existing New Data Sources RDBMS, Files RDBMS, Files ETL File based Go / Docker Data Distribution Files Kafka Data Destination RDBMS Cassandra Reporting Tools Various Streamlined
  • 5. Design Decisions •  Data Modeling –  Partition Key/Size –  “Read” Response time –  Handling Consistency –  Collection Columns: Sets & Maps –  Logical separation of raw & processed data –  All lookup data in a single table •  Indexes –  Primary, Inverted, Secondary, DSE Search Indexes •  DSE Search –  range-queries –  regular-expression –  non-equality –  faceted 5© 2015. All Rights Reserved.
  • 6. Design Decisions •  Consistency –  W Consistency + R Consistency > Replication Factor •  Indexes 6© 2015. All Rights Reserved. Data Access Options Data / Index Storage Response Time Maintain Cardinality Search Consistency Primary Key Data High V Fast App High Limited Tunable Duplicate Data (Primary Key) Data High V Fast App High Limited Tunable Inverted Index Index Low Fast App High Limited Tunable Secondary Index Index Low Medium System Low Limited Tunable DSE Search Index Medium Slow (relative) System Any Versatile One (R)
  • 7. Benchmarking: Indexes 0.02 0.04 1.6 0.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Primary Index Inverted Index DSE Search Secondary Index 7© 2015. All Rights Reserved. Timeinseconds Index Type •  22.6 million rows •  6 node cluster
  • 8. Performance •  3 Replication Factor •  Write Heavy –  Increased concurrent writes to 64 (from 32) –  Decreased concurrent reads to 16 (from 32) –  Size-tiered compaction strategy •  Cassandra cluster with DSE Search enabled on all nodes •  Virtual nodes set to 16 •  All caches disabled except filter cache •  EC2 Snitch on AWS – 3 AZs •  DSE Search soft auto-commit max time to 10s 8© 2015. All Rights Reserved.
  • 9. Solution Stack 9© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 10. Solution Stack: Plug-In Framework 10© 2015. All Rights Reserved. •  Go Service: Plugins chained in a single process •  Packaged & deployed in a Docker Container •  Bootstrapped from a config •  100% developed in-house RUNNER PLUGIN IN CHANNEL OUT CHANNEL RUNNER PLUGIN IN CHANNEL OUT CHANNEL RUNNER PLUGIN IN CHANNEL OUT CHANNEL GO  SERVICE
  • 11. Solution Stack 11© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 12. Solution Stack 12© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 13. Solution Stack 13© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 14. Solution Stack 14© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 15. Solution Stack 15© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 16. 16© 2015. All Rights Reserved. •  Cassandra •  Data Storage •  Go-Based Plugin Framework •  Go services for data Ingestion & Processing •  Docker •  Packaging and deployment •  Mesosphere •  Single view of infrastructure •  Marathon •  Launch containers •  Kafka •  Data transfer and distribution •  Consul •  Service discovery and configuration management •  Jenkins •  Continuous Integration Solution Stack
  • 17. Benchmarking: Data Processing 17© 2015. All Rights Reserved. •  Test for a functional group •  Cassandra: 6 node cluster •  Kafka: 6 node cluster •  Go Services: 3 •  Primary Data Source: Oracle •  Time: 360 minutes •  Data Size: 1 year Description Measure Total rows processed 450 million De-normalized rows 11.8 million Rate of processing (Go Services) ~300k tps Rate of processing (Platform) ~21k tps % time waiting on data ingestion 75%
  • 18. Challenges •  Not all query patterns are known in advance •  Index rebuilds are costly •  Business adjusting to near real-time data •  Operational support adjustments •  Backup/Restore •  Finding Talent – We are hiring! 18© 2015. All Rights Reserved.