SlideShare uma empresa Scribd logo
1 de 39
Extreme Data Velocity
Continuous Availability
Operational Simplicity
Michael Shaler
Senior Director, Business Development

©2013 DataStax Confidential. Do not distribute without consent.
What is Big Data’s payoff?
DataStax: CRN’s “10 Coolest Big Data Startups”
Cassandra: InfoWorld’s Technology of the Year

1,000+ production deployments and 300 customers
$84M in funding from industry-leading investors
We are the first viable alternative to
Oracle for modern online
applications.

We seek to be the first and best
choice in databases.
No, Seriously…
Real-world Use Cases
Internet of Things Database Requirements
• “UTC subject predicate”: Time series data and metadata are the lingua franca of
sensors/device data communications
• FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs
with variable schemas/data models is the norm—and unless you tell them to do so, sensors
never, ever sleep…
• HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary.

• DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT
operational datastores
• AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need
• SPOILER ALERT: DataStax Enterprise supports all of the above

7
Time Series Analytics: 70B readings
Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households
Improvements in demand forecasting could yield EBITDA > $100M per GW saved

•
•

•

$5M CAPEX
10 man/months delivery
(Deploy, DevOps, Tuning)
Ongoing OPEX of > $1M

•
•
•
•

$450K OPEX
2 DevOps running 15 AWS nodes
Faster performance in 2 weeks
…All in the cloud
Major Changes: The Evolving Data Center

LOB
App

LOB
App

LOB
App

Data Warehouse

Oracle

MySQL

SQL
Server

Teradata/
Exadata

“What’s Happening?”
Hyper Velocity
Transactional

“What Happened?”
Massive Volume
Bit Bucket

NoSQL

Hadoop
The Application World *HAS* Changed
Common Use Cases

•

Web product searches

•

Internal document search (law firms, etc.)

•

Real estate/property searches

•

Social media match ups

•

Web & application log management / analysis

•

Big data OLTP and write intensive systems

•

Time series data management

•

High velocity device data consumption and analysis

•

Healthcare systems input and analysis

•

Media streaming (music, movies, etc.)

•

Online Web retail (shopping carts, user transactions, etc.)

•

Online gaming (real-time messaging, etc.)

•

Real time data analytics

•
•

Web click-stream analysis

•

Buyer event and behavior analytics

•

Fraud detection and analysis

•

Risk analysis and management

•

11

Social media input and analysis

Supply chain analytics
Continuous Availability Commentary
Cassandra: Architecture as Foundation
Virginia

Santa Clara

London

Sydney
The New DR: Simian Army “Dystopia as a Service”
Virginia

London

Santa Clara

Sydney
14
Heterogeneous Workloads: Active Everywhere
Read

Analyze

Write

Virginia

London

Search
Write

Santa Clara

Sydney

Search
Write
15

Read
Our Product Solution

• DataStax Enterprise
powers the big data apps
that transform business.
• Extreme Data Velocity
• Continuous Availability
• Operational Simplicity
Operational Simplicity

33M streaming customers
2T API calls/year
~1,200 Servers
55 AWS clusters
12 developers
4 operators
0 New data centers
©2012 DataStax

“Our primary operational data store
is now Cassandra, not Oracle.”
17
Performance: NoSQL Leadership

Cassandra vs. HBase:

•10x more read throughput
•100x faster read latency
•8x more write throughput
•8x faster scan latency
•4x more scan throughput

Source: Solving Big Data Challenges for Enterprise Application Performance Management
Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
Performance: NoSQL Leadership
YCSB Load Process

YCSB Read-mostly

YCSB Read-write mix

©2012 DataStax

YCSB Write-mostly

19
From STB to the Scalable Cloud Message Bus

Use Case: X1 Sports App

18000)
16000)

API/sec)

14000)

Even in preproduction
environment prior
to tuning, achieved
near-linear
scalability

12000)
10000)
8000)
6000)
4000)
2000)
0)
4)

8)

12) 16) 20) 24)

Ring)Size)

20

Enabling a richer
active consumer
experience across
multiple devices,
multiple platforms
Instagram Scales Engaged Networks
• Transitioned from Redis (in-memory cache) to
Cassandra in Amazon Web Services EC2
• Doubled cluster—and then doubled again—to support
150MM users on new infrastructure
• Continue to scale in spite of Justin Bieber storms, video
formats, new features, new markets
CASSAN DRA
AT IN STAGRAM
Rick Branson, Infrastructure Engineer
@
rbranson
c om i t ac b02daea57dc a889c 2aa45963754a271f a51566
m
Aut hor : Ri c k Br ans on
Dat e:
Sun Feb 10 20: 36: 34 2013 - 0800
Doubl ed C* c l us t er

2013 Cassandra Summit
#cassandra13
June 12, 2013
San Francisco, CA

21
Our Vision

DataStax is driving
Cassandra to be the first
viable alternative to the
Oracle database for
companies who are
transforming the way they
interact with customers.

Getting ahead of exploding growth
Sign big, new contracts all the time (ESPN)
• 200M unique users per month
• 40TB of data
•

Flexible architecture
•

“Couldn’t shoehorn RDBMS technology”

Very small operations team
3 people
• 20 clusters
• 100’s of nodes
•
Why We Exist

Today’s applications must be
always available and lightning
fast as they scale to previously
unimaginable levels.
Cassandra delivers both with a
beautifully simple and elegant
architecture.

“We need a real-time, massively
scalable architecture, where no
one node is a single point of
failure, that can easily span
multiple data centers and cloud
availability zones, and that’s
Cassandra.”
What We Do Best

Cassandra was designed to do
things that are impossible in
other databases when it comes
to availability and
performance. Forget about
losing a machine here or there -Cassandra delivers a world
where you can lose an entire
datacenter and still perform as
your customers expect.

“We have to be ready for disaster
recovery all the time. It’s really
great that Cassandra allows for
active-active multiple data centers
where we can read and write
anywhere”
Jay Patel
Technical Architect at eBay
(Describing why they switched from legacy
relational architecture)
The Modern “Application”
The Modern “Application”
Fraud Detection and Prevention
What It Means In Real Life
What It Means In Real Life
Cassandra Summit SF 2013
Real Growth In Production
We are the first viable
alternative to Oracle for
modern online applications.
Thank You

We power the big data apps
that transform business.

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0

©2013 DataStax Confidential. Do not distribute without consent.
BENEFITS

FEATURES

Security in Cassandra

Internal Authentication
Manages login IDs and
passwords inside the
database
+ Ensures only
authorized users can
access a database
system using internal
validation
+ Simple to implement
and easy to
understand
+ No learning curve from
the relational world

Object Permission
Management
controls who has access
to what and who can do
what in the database

Client to Node
Encryption
protects data in flight to
and from a database
cluster

+ Provides granular based
control over who can
add/change/delete/read
data

+ Ensures data cannot be
captured/stolen in route
to a server

+ Uses familiar
GRANT/REVOKE from
relational systems
+ No learning curve

+ Data is safe both in
flight from/to a
database and on the
database; complete
coverage is ensured
BENEFITS

FEATURES

Advanced Security in DataStax Enterprise

External Authentication
uses external security
software packages to
control security

Transparent Data
Encryption
encrypts data at rest

Data Auditing
provides trail of who did
and looked at what/when

+ Only authorized users
have access to a
database system using
external validation

+ Protects sensitive data
at rest from theft and
from being read at the
file system level

+ Supplies admins with
an audit trail of all
accesses and changes

+ Uses most trusted
external security
packages (Kerberos,
LDAP), mainstays in
government and
finance

+ No changes needed at
application level

+ Single sign on to all
data domains

+ Can encrypt both
Cassandra and
Hadoop data

+ Granular control to
audit only what’s
needed
+ Uses log4j interface to
ensure performance
and efficient audit
operations

Mais conteúdo relacionado

Mais procurados

Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
WSO2
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
Databricks
 

Mais procurados (20)

Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Cloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service OptionCloud Modernization and Data as a Service Option
Cloud Modernization and Data as a Service Option
 
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
Webinar - Data Management for the "Right-Now" Economy - The 5 Key IngredientsWebinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao KambleGoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBData Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
 
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data Lake
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphWebinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE Graph
 

Semelhante a DataStax

Semelhante a DataStax (20)

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the Cloud
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Speak to Your Data
Speak to Your DataSpeak to Your Data
Speak to Your Data
 
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStaxWebinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

DataStax

  • 1. Extreme Data Velocity Continuous Availability Operational Simplicity Michael Shaler Senior Director, Business Development ©2013 DataStax Confidential. Do not distribute without consent.
  • 2. What is Big Data’s payoff?
  • 3. DataStax: CRN’s “10 Coolest Big Data Startups” Cassandra: InfoWorld’s Technology of the Year 1,000+ production deployments and 300 customers $84M in funding from industry-leading investors
  • 4. We are the first viable alternative to Oracle for modern online applications. We seek to be the first and best choice in databases.
  • 7. Internet of Things Database Requirements • “UTC subject predicate”: Time series data and metadata are the lingua franca of sensors/device data communications • FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs with variable schemas/data models is the norm—and unless you tell them to do so, sensors never, ever sleep… • HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary. • DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT operational datastores • AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need • SPOILER ALERT: DataStax Enterprise supports all of the above 7
  • 8. Time Series Analytics: 70B readings Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households Improvements in demand forecasting could yield EBITDA > $100M per GW saved • • • $5M CAPEX 10 man/months delivery (Deploy, DevOps, Tuning) Ongoing OPEX of > $1M • • • • $450K OPEX 2 DevOps running 15 AWS nodes Faster performance in 2 weeks …All in the cloud
  • 9. Major Changes: The Evolving Data Center LOB App LOB App LOB App Data Warehouse Oracle MySQL SQL Server Teradata/ Exadata “What’s Happening?” Hyper Velocity Transactional “What Happened?” Massive Volume Bit Bucket NoSQL Hadoop
  • 10. The Application World *HAS* Changed
  • 11. Common Use Cases • Web product searches • Internal document search (law firms, etc.) • Real estate/property searches • Social media match ups • Web & application log management / analysis • Big data OLTP and write intensive systems • Time series data management • High velocity device data consumption and analysis • Healthcare systems input and analysis • Media streaming (music, movies, etc.) • Online Web retail (shopping carts, user transactions, etc.) • Online gaming (real-time messaging, etc.) • Real time data analytics • • Web click-stream analysis • Buyer event and behavior analytics • Fraud detection and analysis • Risk analysis and management • 11 Social media input and analysis Supply chain analytics
  • 13. Cassandra: Architecture as Foundation Virginia Santa Clara London Sydney
  • 14. The New DR: Simian Army “Dystopia as a Service” Virginia London Santa Clara Sydney 14
  • 15. Heterogeneous Workloads: Active Everywhere Read Analyze Write Virginia London Search Write Santa Clara Sydney Search Write 15 Read
  • 16. Our Product Solution • DataStax Enterprise powers the big data apps that transform business. • Extreme Data Velocity • Continuous Availability • Operational Simplicity
  • 17. Operational Simplicity 33M streaming customers 2T API calls/year ~1,200 Servers 55 AWS clusters 12 developers 4 operators 0 New data centers ©2012 DataStax “Our primary operational data store is now Cassandra, not Oracle.” 17
  • 18. Performance: NoSQL Leadership Cassandra vs. HBase: •10x more read throughput •100x faster read latency •8x more write throughput •8x faster scan latency •4x more scan throughput Source: Solving Big Data Challenges for Enterprise Application Performance Management Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
  • 19. Performance: NoSQL Leadership YCSB Load Process YCSB Read-mostly YCSB Read-write mix ©2012 DataStax YCSB Write-mostly 19
  • 20. From STB to the Scalable Cloud Message Bus Use Case: X1 Sports App 18000) 16000) API/sec) 14000) Even in preproduction environment prior to tuning, achieved near-linear scalability 12000) 10000) 8000) 6000) 4000) 2000) 0) 4) 8) 12) 16) 20) 24) Ring)Size) 20 Enabling a richer active consumer experience across multiple devices, multiple platforms
  • 21. Instagram Scales Engaged Networks • Transitioned from Redis (in-memory cache) to Cassandra in Amazon Web Services EC2 • Doubled cluster—and then doubled again—to support 150MM users on new infrastructure • Continue to scale in spite of Justin Bieber storms, video formats, new features, new markets CASSAN DRA AT IN STAGRAM Rick Branson, Infrastructure Engineer @ rbranson c om i t ac b02daea57dc a889c 2aa45963754a271f a51566 m Aut hor : Ri c k Br ans on Dat e: Sun Feb 10 20: 36: 34 2013 - 0800 Doubl ed C* c l us t er 2013 Cassandra Summit #cassandra13 June 12, 2013 San Francisco, CA 21
  • 22. Our Vision DataStax is driving Cassandra to be the first viable alternative to the Oracle database for companies who are transforming the way they interact with customers. Getting ahead of exploding growth Sign big, new contracts all the time (ESPN) • 200M unique users per month • 40TB of data • Flexible architecture • “Couldn’t shoehorn RDBMS technology” Very small operations team 3 people • 20 clusters • 100’s of nodes •
  • 23. Why We Exist Today’s applications must be always available and lightning fast as they scale to previously unimaginable levels. Cassandra delivers both with a beautifully simple and elegant architecture. “We need a real-time, massively scalable architecture, where no one node is a single point of failure, that can easily span multiple data centers and cloud availability zones, and that’s Cassandra.”
  • 24. What We Do Best Cassandra was designed to do things that are impossible in other databases when it comes to availability and performance. Forget about losing a machine here or there -Cassandra delivers a world where you can lose an entire datacenter and still perform as your customers expect. “We have to be ready for disaster recovery all the time. It’s really great that Cassandra allows for active-active multiple data centers where we can read and write anywhere” Jay Patel Technical Architect at eBay (Describing why they switched from legacy relational architecture)
  • 26. The Modern “Application” Fraud Detection and Prevention
  • 27. What It Means In Real Life
  • 28. What It Means In Real Life
  • 30. Real Growth In Production
  • 31. We are the first viable alternative to Oracle for modern online applications.
  • 32. Thank You We power the big data apps that transform business. ©2013 DataStax Confidential. Do not distribute without consent.
  • 33. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 34. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 35. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 36. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 37. DataStax OpsCenter 4.0 ©2013 DataStax Confidential. Do not distribute without consent.
  • 38. BENEFITS FEATURES Security in Cassandra Internal Authentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database Client to Node Encryption protects data in flight to and from a database cluster + Provides granular based control over who can add/change/delete/read data + Ensures data cannot be captured/stolen in route to a server + Uses familiar GRANT/REVOKE from relational systems + No learning curve + Data is safe both in flight from/to a database and on the database; complete coverage is ensured
  • 39. BENEFITS FEATURES Advanced Security in DataStax Enterprise External Authentication uses external security software packages to control security Transparent Data Encryption encrypts data at rest Data Auditing provides trail of who did and looked at what/when + Only authorized users have access to a database system using external validation + Protects sensitive data at rest from theft and from being read at the file system level + Supplies admins with an audit trail of all accesses and changes + Uses most trusted external security packages (Kerberos, LDAP), mainstays in government and finance + No changes needed at application level + Single sign on to all data domains + Can encrypt both Cassandra and Hadoop data + Granular control to audit only what’s needed + Uses log4j interface to ensure performance and efficient audit operations