SlideShare uma empresa Scribd logo
Using approximate data structures
for small, insightful analytics.
Ben Kornmeier, Engineer
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
About Protectwise
● Cloud security platform, that aims to make threats
actionable and obvious.
● Aims to cut down on the amount of “noise” that a
network can create, and only show the most important
details.
● Has a big emphasis on real time data.
● Ingests and processes terabytes of data a day.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Goals Of Count Sumula
● Quick report generation.
● Support high cardinality data.
● Compute averages, min, and max.
● Easy to add additional aggregations.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Challenge: Daily Data Ingestion
● 2 billion netflow updates.
● Ingests 20TB of raw network traffic.
● Generates 150 million observations.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Challenge: Costs of Processing Data.
● Traditional batch processing is accurate, but slow.
○ We want results in seconds not hours or days.
● Compute resources are very expensive at our scale.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Challenge: Making a Great User Experience
● A user should expect:
○ Hardly any waiting for report generate.
○ Up to date reports.
○ Meaningful reports that are actionable and concise.
○ Reports that are persisted forever and can be
recombined after the fact to gain additional insights.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Some Use Cases
● Show me a count all of the hosts that had a threat on
them in the past year.
● Show me the hosts with the most threats encountered
over the course of a year.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Use Cases Examined
● Show me a count all of the hosts that had a threat on
them in the past year.
○ IP address has a very high cardinality 340 undecillion (ipv6)
■Or: 340,282,366,920,938,463,463,374,607,431,768,211,456 (WOW!)
○ Storage costs could be high.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Use Cases Examined Continued
● Show me the hosts with the most threats encountered
over the course of a year.
○ Once again, high cardinality.
○ Same storage costs as the example before, but now we have to sort,
which is going to be tough. O(n log n).
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Considerations For Our Solution
● Be real time.
● Could not grow without bounds.
● Data must be around for decades or more.
● Be able to return queries for large time ranges.
● Be actionable and concise.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
The Realization
● In general users can live with an approximate result!
○ Approximate results use less space.
○ Can be computed in memory.
○ Approximate results can be bounded by trading accuracy for space
○ Approximate results are fast enough to compute in real time.
○ Meets two of our goals.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Some Approximations We Used
● HyperLogLog
● Count Min Sketch
● Stream Summary
● Bloom Filter
● Layered Bloom Filter
● Compound Approximations
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
HyperLogLog
● Only counts the amount of consecutive 0 bits.
● Uses the count of consecutive 0 bits and the probability
of it occurring to determine an estimate of unique
elements seen.
● Assumes a good hashing function (Murmur 3).
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Example: HyperLogLog
Assuming our hashing function only returns 4 bits (16
combinations).
Bit pattern(s) Chance of occurrence
0000 1 / 16
1000, 0001 2 / 16 or 1 / 8
0011,1001,1100,0100,0010 5 / 16
0111,1011,1101,1110,1010,0110 7 / 16
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
CountMinSketch
● Essentially a matrix.
● Inserts are duplicated across rows.
● Inserts are hashed differently per row.
● Elements can only add.
● Used for frequency estimation.
● Can be used for averages, min, max as well.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Example: CountMinSketch
Inserting an element
“Ben”
“Eric”
1 null null null null
null null 1 null null
1 null 1 null null
null null 2 null null
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Example: CountMinSketch Continued
Retrieving the count for “Ben”
“Ben” 1 null 1 null null
null null 2 null null
Compare the values return, and take the min, in this case 1.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
How Did We Store The Approximations?
● We generate enough approximations that we create
about 1 GB of data each month.
○ Much better than the amount stored for full fidelity data.
● First approach just use Redis.
● Second approach Redis and Cassandra.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
First Approach Redis Only
Advantages
● Easy
● Fast
Disadvantages
● Ticking time bomb since Redis is memory only.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Second Approach C* And Redis
Advantages
● C* scales infinitely.
● Redis can be used when speed is important.
● Not a ticking time bomb.
Disadvantages
● Not as easy as previous solution.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
How We Use Redis With Cassandra
● Elements are placed in Redis and keyed on bucket
name and time.
● Once a element from the next time interval is
encountered, data is moved from Redis to Cassandra.
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
{“bucket”: “observation”,”time”:1, “value”: 1}
{“bucket”: “observation”,”time”:1, “value”: 2}
{“bucket”: “observation”,”time”:2, “value”: 10}
Cassandra
Redis
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
{“bucket”: “observation”,”time”:1, “value”: 2}
{“bucket”: “observation”,”time”:2, “value”: 10}
Cassandra
Redis
{“bucket”: “observation”,”time”:1, “value”: 1}
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
{“bucket”: “observation”,”time”:2, “value”: 10}
Cassandra
Redis
{“bucket”: “observation”,”time”:1, “value”: 1}
{“bucket”: “observation”,”time”:1, “value”: 2}
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
{“bucket”: “observation”,”time”:2, “value”: 10}
Cassandra
Redis
{“bucket”: “observation”,”time”:1, “value”: 1}
{“bucket”: “observation”,”time”:1, “value”: 2}
Elements are
summed
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
{“bucket”: “observation”,”time”:2, “value”: 10}
Cassandra
Redis
{“bucket”: “observation”,”time”:1, “value”: 3}
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
Cassandra
Redis
{“bucket”: “observation”,”time”:2, “value”: 10}
{“bucket”: “observation”,”time”:1, “value”: 3}
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Incoming Updates
Cassandra
{“bucket”: “observation”,”time”:1, “value”: 3}
Redis
{“bucket”: “observation”,”time”:2, “value”: 10}
Element from time 1 is determined to be expired and written to Cassandra
Cassandra Schema
CREATE TABLE buckets (
name text, // bucket name
time_bucket timestamp, // Time floored on next interval up.
time_unit int, // {1: “minute”, 2: “hour”, 3: “day” }
algorithm text, // [HyperLogLog, CountMinSketch, etc]
time timestamp, // the actual time
d blob, //Serialized data
PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
Cassandra Schema
CREATE TABLE buckets (
name text, // bucket name
time_bucket timestamp, // Time floored on next interval up.
time_unit int, // {1: “minute”, 2: “hour”, 3: “day” }
algorithm text, // [HyperLogLog, CountMinSketch, etc]
time timestamp, // the actual time
d blob, //Serialized data
PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
Cassandra Schema
CREATE TABLE buckets (
name text, // bucket name
time_bucket timestamp, // Time floored on next interval up.
time_unit int, // {1: “minute”, 2: “hour”, 3: “day” }
algorithm text, // [HyperLogLog, CountMinSketch, etc]
time timestamp, // the actual time
d blob, //Serialized data
PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential.
Advantages of using Cassandra and Redis
● Elements are written in their finalized form to Cassandra.
○ Compactor friendly.
● Updates can happen very fast since Redis is Fast.
● Redis no longer consumes memory unbounded.
Caveats
● Using approximations are just that, approximate.
● Takes time to understand how they work.
● Tuning needs up front knowledge of usage.
https://www.protectwise.com/careers.html
Especially if you’re in Denver!
We’re Hiring!

Mais conteúdo relacionado

Mais procurados

Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
Ben Slater
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
Matija Gobec
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
DataStax
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
Shogo Hoshii
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
DataStax
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
Hiromitsu Komatsu
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
DataStax
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
DataStax
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
DataStax
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
DataStax Academy
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
Rostislav Pashuto
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache Cassandra
AhmedabadJavaMeetup
 
Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with Riemann
Patricia Gorla
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
DataStax
 

Mais procurados (20)

Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Deep dive into event store using Apache Cassandra
Deep dive into event store using Apache CassandraDeep dive into event store using Apache Cassandra
Deep dive into event store using Apache Cassandra
 
Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with Riemann
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
 

Destaque

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
DataStax
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
DataStax
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
DataStax
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
DataStax
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
DataStax
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
DataStax
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
DataStax
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
DataStax
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
DataStax
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
DataStax
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
DataStax
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
DataStax
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
DataStax
 
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
DataStax Academy
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax
 

Destaque (20)

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
 
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
 

Semelhante a Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, ProtectWise) | Cassandra Summit 2016

Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and Cassandra
Dave Bechberger
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18
Ashley Brown
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
J On The Beach
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
DataStax Academy
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Hernan Costante
 
Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction
Alex G. Lee, Ph.D. Esq. CLP
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
Frank Denis
 
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB World 2018: MongoDB for High Volume Time Series Data StreamsMongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
Aerospike, Inc.
 
Xanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology IntroductionXanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology Introduction
Alex G. Lee, Ph.D. Esq. CLP
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar
 

Semelhante a Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, ProtectWise) | Cassandra Summit 2016 (20)

Performance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and CassandraPerformance is not an Option - gRPC and Cassandra
Performance is not an Option - gRPC and Cassandra
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18Storm at spider.io - London Storm Meetup 2013-06-18
Storm at spider.io - London Storm Meetup 2013-06-18
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction Silicon Valley Workshop: Xanadu introduction
Silicon Valley Workshop: Xanadu introduction
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
 
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB World 2018: MongoDB for High Volume Time Series Data StreamsMongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Xanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology IntroductionXanadu Big Data Platform Technology Introduction
Xanadu Big Data Platform Technology Introduction
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
Ridwan Fadjar Septian PyCon ID 2021 Regular Talk - django application monitor...
 

Mais de DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 

Mais de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Último

Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 

Último (20)

Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 

Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, ProtectWise) | Cassandra Summit 2016

  • 1. Using approximate data structures for small, insightful analytics. Ben Kornmeier, Engineer
  • 2. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. About Protectwise ● Cloud security platform, that aims to make threats actionable and obvious. ● Aims to cut down on the amount of “noise” that a network can create, and only show the most important details. ● Has a big emphasis on real time data. ● Ingests and processes terabytes of data a day.
  • 3. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Goals Of Count Sumula ● Quick report generation. ● Support high cardinality data. ● Compute averages, min, and max. ● Easy to add additional aggregations.
  • 4. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Challenge: Daily Data Ingestion ● 2 billion netflow updates. ● Ingests 20TB of raw network traffic. ● Generates 150 million observations.
  • 5. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Challenge: Costs of Processing Data. ● Traditional batch processing is accurate, but slow. ○ We want results in seconds not hours or days. ● Compute resources are very expensive at our scale.
  • 6. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Challenge: Making a Great User Experience ● A user should expect: ○ Hardly any waiting for report generate. ○ Up to date reports. ○ Meaningful reports that are actionable and concise. ○ Reports that are persisted forever and can be recombined after the fact to gain additional insights.
  • 7. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Some Use Cases ● Show me a count all of the hosts that had a threat on them in the past year. ● Show me the hosts with the most threats encountered over the course of a year.
  • 8. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Use Cases Examined ● Show me a count all of the hosts that had a threat on them in the past year. ○ IP address has a very high cardinality 340 undecillion (ipv6) ■Or: 340,282,366,920,938,463,463,374,607,431,768,211,456 (WOW!) ○ Storage costs could be high.
  • 9. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Use Cases Examined Continued ● Show me the hosts with the most threats encountered over the course of a year. ○ Once again, high cardinality. ○ Same storage costs as the example before, but now we have to sort, which is going to be tough. O(n log n).
  • 10. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Considerations For Our Solution ● Be real time. ● Could not grow without bounds. ● Data must be around for decades or more. ● Be able to return queries for large time ranges. ● Be actionable and concise.
  • 11. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. The Realization ● In general users can live with an approximate result! ○ Approximate results use less space. ○ Can be computed in memory. ○ Approximate results can be bounded by trading accuracy for space ○ Approximate results are fast enough to compute in real time. ○ Meets two of our goals.
  • 12. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Some Approximations We Used ● HyperLogLog ● Count Min Sketch ● Stream Summary ● Bloom Filter ● Layered Bloom Filter ● Compound Approximations
  • 13. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. HyperLogLog ● Only counts the amount of consecutive 0 bits. ● Uses the count of consecutive 0 bits and the probability of it occurring to determine an estimate of unique elements seen. ● Assumes a good hashing function (Murmur 3).
  • 14. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Example: HyperLogLog Assuming our hashing function only returns 4 bits (16 combinations). Bit pattern(s) Chance of occurrence 0000 1 / 16 1000, 0001 2 / 16 or 1 / 8 0011,1001,1100,0100,0010 5 / 16 0111,1011,1101,1110,1010,0110 7 / 16
  • 15. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. CountMinSketch ● Essentially a matrix. ● Inserts are duplicated across rows. ● Inserts are hashed differently per row. ● Elements can only add. ● Used for frequency estimation. ● Can be used for averages, min, max as well.
  • 16. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Example: CountMinSketch Inserting an element “Ben” “Eric” 1 null null null null null null 1 null null 1 null 1 null null null null 2 null null
  • 17. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Example: CountMinSketch Continued Retrieving the count for “Ben” “Ben” 1 null 1 null null null null 2 null null Compare the values return, and take the min, in this case 1.
  • 18. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. How Did We Store The Approximations? ● We generate enough approximations that we create about 1 GB of data each month. ○ Much better than the amount stored for full fidelity data. ● First approach just use Redis. ● Second approach Redis and Cassandra.
  • 19. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. First Approach Redis Only Advantages ● Easy ● Fast Disadvantages ● Ticking time bomb since Redis is memory only.
  • 20. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Second Approach C* And Redis Advantages ● C* scales infinitely. ● Redis can be used when speed is important. ● Not a ticking time bomb. Disadvantages ● Not as easy as previous solution.
  • 21. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. How We Use Redis With Cassandra ● Elements are placed in Redis and keyed on bucket name and time. ● Once a element from the next time interval is encountered, data is moved from Redis to Cassandra.
  • 22. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates {“bucket”: “observation”,”time”:1, “value”: 1} {“bucket”: “observation”,”time”:1, “value”: 2} {“bucket”: “observation”,”time”:2, “value”: 10} Cassandra Redis
  • 23. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates {“bucket”: “observation”,”time”:1, “value”: 2} {“bucket”: “observation”,”time”:2, “value”: 10} Cassandra Redis {“bucket”: “observation”,”time”:1, “value”: 1}
  • 24. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates {“bucket”: “observation”,”time”:2, “value”: 10} Cassandra Redis {“bucket”: “observation”,”time”:1, “value”: 1} {“bucket”: “observation”,”time”:1, “value”: 2}
  • 25. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates {“bucket”: “observation”,”time”:2, “value”: 10} Cassandra Redis {“bucket”: “observation”,”time”:1, “value”: 1} {“bucket”: “observation”,”time”:1, “value”: 2} Elements are summed
  • 26. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates {“bucket”: “observation”,”time”:2, “value”: 10} Cassandra Redis {“bucket”: “observation”,”time”:1, “value”: 3}
  • 27. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates Cassandra Redis {“bucket”: “observation”,”time”:2, “value”: 10} {“bucket”: “observation”,”time”:1, “value”: 3}
  • 28. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Incoming Updates Cassandra {“bucket”: “observation”,”time”:1, “value”: 3} Redis {“bucket”: “observation”,”time”:2, “value”: 10} Element from time 1 is determined to be expired and written to Cassandra
  • 29. Cassandra Schema CREATE TABLE buckets ( name text, // bucket name time_bucket timestamp, // Time floored on next interval up. time_unit int, // {1: “minute”, 2: “hour”, 3: “day” } algorithm text, // [HyperLogLog, CountMinSketch, etc] time timestamp, // the actual time d blob, //Serialized data PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
  • 30. Cassandra Schema CREATE TABLE buckets ( name text, // bucket name time_bucket timestamp, // Time floored on next interval up. time_unit int, // {1: “minute”, 2: “hour”, 3: “day” } algorithm text, // [HyperLogLog, CountMinSketch, etc] time timestamp, // the actual time d blob, //Serialized data PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
  • 31. Cassandra Schema CREATE TABLE buckets ( name text, // bucket name time_bucket timestamp, // Time floored on next interval up. time_unit int, // {1: “minute”, 2: “hour”, 3: “day” } algorithm text, // [HyperLogLog, CountMinSketch, etc] time timestamp, // the actual time d blob, //Serialized data PRIMARY KEY ((name, time_bucket, time_unit, algorithm), time)
  • 32. ©2016 ProtectWise, Inc. All rights reserved. Proprietary & Confidential. Advantages of using Cassandra and Redis ● Elements are written in their finalized form to Cassandra. ○ Compactor friendly. ● Updates can happen very fast since Redis is Fast. ● Redis no longer consumes memory unbounded.
  • 33. Caveats ● Using approximations are just that, approximate. ● Takes time to understand how they work. ● Tuning needs up front knowledge of usage.