SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Build Low Latency, Windowless
Event Processing Pipelines
with Quine and ScyllaDB
Matthew Cullum, Director of Engineering, thatDot
sponsored by
Matthew
■ Director of Engineering @thatDot, makers of Quine
■ 18+ years experience in enterprise software
■ Industrial automation and distributed systems
■ Senior engineer, senior architect, CTO roles
@mattcullum
@brackishman
■ Complex Event Processing
■ Graph Data Structure
■ Design an architecture
■ Quine + ScyllaDB real world performance
■ Questions
Presentation Agenda
Build Low Latency, Windowless Event
Processing Pipelines with Quine and
ScyllaDB
Our Goal:
■ Fits within existing high volume streams
■ Scales linearly to meet any enterprise scale with < 1ms latency
■ Maintains a single stateful graph data structure
■ Performs complex multi-node queries in real-time
■ No time-windows
Complex event processing
with graph data structure
■ High volume of events in one or more streams of data
■ Kafka, Kinesis, Pulsar, etc.
■ Infer complex relationships between the events
■ Use Cases:
■ Fraud Detection
■ Network Management
■ Advanced Persistent Threat Detection
■ XDR/EDR
■ Monitoring State Change (CDC)
Complex Event Processing
Graph Data Structure
■ Nodes with properties, connected by edges
■ Categorical data (a lot of rich information gets encoded or discarded)
■ No costly joins
■ Current graph databases can’t
■ Process at event streaming scale (>1M+ events/second) while….
■ Completing multi-node traversals (complex queries) < 1ms
■ What we want to accomplish
■ Event processing performance - 1M+ events/sec ingest
■ Match a relatively rare (2%) 4-node complex pattern in real-time
■ Resilient in face of infrastructure/network failures
■ Infrastructure cost effective
Scaling Graph for Production Use Cases
■ Start with a database that already has similar characteristics
■ Design a graph data structure over a key-value store
■ Translate graph queries into per-node queries
What-If Architecture
MATCH (attempt1:login)-[:NEXT]->(attempt2:login)-[:NEXT]->(attempt3:login) WHERE
attempt1.result=”FAILURE”
AND attempt2.result=”FAILURE”
AND attempt3.result=”SUCCESS”
■ Find node of type login, WHERE .result=”FAILURE”, follow NEXT edges
■ Looking for neighbor of type login WHERE .result=”FAILURE”, follow NEXT
■ Edges looking for neighbor of type login WHERE .result=”SUCCESS”
Processing Event Streams
Quine ingests data → builds a graph→ persists to pluggable storage →runs
live computation on graph to compute results → and then streams them out.
In Operation
+
Quine Guard Band Test Example
■ A script is used to generate events
■ Quine and DB host failures manually
triggered.
■ Pre-loaded Kafka with enough events to
sustain one million events/second for
two hours.
■ Github repo for reproducible test
Component
# of
hosts
Typical Host types
Quine Cluster 140 ● c2-standard-30 (30 vCPUs, 120GB RAM)
● Max heap for JVM set to 12GB
● 1 hot spare
DB Cluster 66 ● n1-highmem-32 (32 vCPU, 208GB RAM)
● x 375 GB local SSD each
● r1 x 375 GB local SSD each
Kafka 3 ● n2-standard-4 (4 vCPU, 16 GB RAM)
● 420 partitions
1 Million Events/Second With Failures
#1 initial peak of 1.25M events/sec
#2 Quine settles into a steady ingest rate > 1M events/sec
#3 Quine recovers nicely after killing single node
#4 DB maintenance event exactly 1 hour into test
#5 Quine has no problem with two-node failure events.
#6 Stopped and resumed a Quine host for about 1 minute
to inject high latency
#7 Stop and resume a persistor host for about 1 minute to
inject high latency
#8 Single DB host killed and quickly recovered
#9 DB maintenance event exactly 2 hours into test
#10 Consumes remaining data from Kafka
21,000 Standing Query Results/Second
MATCH (p0)-[:parent]->(p1)-[:parent]->(p2)-[:parent]->(p3)
WHERE
EXISTS(p0.customer_id) AND
EXISTS(p0.sensor_id) AND
EXISTS(p0.process_id) AND
EXISTS(p0.filename) AND
EXISTS(p0.command_line) AND
EXISTS(p0.user_id) AND
EXISTS(p0.timestamp_unix_utc) AND
EXISTS(p0.sha256)
RETURN
id(p0) AS id
62% Cost Savings!
Component
Original Hosts @
$130/hr
New Hosts @
$50/hr
% Difference
Quine
Cluster
140 x
c2-standard-30
120 x
n2d-standard-16
54%
DB Cluster 66 x
n1-highmem-32
40 x
n2d-highmem-16
70%
Same Results, Significantly Reduced Cost
Quine + ScyllaDB Scale ~Linearly Together
ScyllaDB
Cluster Size
Quine
Cluster Size
Ingest Rate
Cluster-
Wide (e/s)
AVG Ingest
Rate Per-
Host (e/s)
2 8 77,500 9688
20 40 372,500 9300
20 100 882,600 8826
25 100 892,900 8929
Kafka Cluster
■ 10 brokers on n1-standard-16’s,
■ 4200 partitions exist in topic
ScyllaDB Cluster
■ Cluster Size: See table
■ Machine Type: n2d-custom-16-131072
■ Local SSDs: 1x375 GB per host
Quine Cluster
■ Cluster Size: See table
■ Machine Type: n2d-standard-16
Complex Event Processing Is No Longer Hard
■ Quine + ScyllaDB scales up linearly
■ Easily achieves 1M events/sec
and 21000 query results/sec
■ Resilient to infrastructure failures
with back pressure
■ Should scale ~linearly to 10+M events/sec
Thank You
Try Quine:
docs.quine.io/getting-started
matt@thatdot.com
@thatdot
that.re/quine-slack

Mais conteúdo relacionado

Semelhante a Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...Daniel Cohen
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHostedbyConfluent
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Konstantinos Kloudas
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitDon Marti
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkDemi Ben-Ari
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsSharma Podila
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehousePrecisely
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextPrateek Maheshwari
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 

Semelhante a Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB (20)

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALAB
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration Summit
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Resource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsResource Scheduling using Apache Mesos in Cloud Native Environments
Resource Scheduling using Apache Mesos in Cloud Native Environments
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 

Mais de ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mais de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Último

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Último (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB

  • 1. Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB Matthew Cullum, Director of Engineering, thatDot sponsored by
  • 2. Matthew ■ Director of Engineering @thatDot, makers of Quine ■ 18+ years experience in enterprise software ■ Industrial automation and distributed systems ■ Senior engineer, senior architect, CTO roles @mattcullum @brackishman
  • 3. ■ Complex Event Processing ■ Graph Data Structure ■ Design an architecture ■ Quine + ScyllaDB real world performance ■ Questions Presentation Agenda
  • 4. Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB Our Goal: ■ Fits within existing high volume streams ■ Scales linearly to meet any enterprise scale with < 1ms latency ■ Maintains a single stateful graph data structure ■ Performs complex multi-node queries in real-time ■ No time-windows
  • 5. Complex event processing with graph data structure
  • 6. ■ High volume of events in one or more streams of data ■ Kafka, Kinesis, Pulsar, etc. ■ Infer complex relationships between the events ■ Use Cases: ■ Fraud Detection ■ Network Management ■ Advanced Persistent Threat Detection ■ XDR/EDR ■ Monitoring State Change (CDC) Complex Event Processing
  • 7. Graph Data Structure ■ Nodes with properties, connected by edges ■ Categorical data (a lot of rich information gets encoded or discarded) ■ No costly joins
  • 8. ■ Current graph databases can’t ■ Process at event streaming scale (>1M+ events/second) while…. ■ Completing multi-node traversals (complex queries) < 1ms ■ What we want to accomplish ■ Event processing performance - 1M+ events/sec ingest ■ Match a relatively rare (2%) 4-node complex pattern in real-time ■ Resilient in face of infrastructure/network failures ■ Infrastructure cost effective Scaling Graph for Production Use Cases
  • 9. ■ Start with a database that already has similar characteristics ■ Design a graph data structure over a key-value store ■ Translate graph queries into per-node queries What-If Architecture MATCH (attempt1:login)-[:NEXT]->(attempt2:login)-[:NEXT]->(attempt3:login) WHERE attempt1.result=”FAILURE” AND attempt2.result=”FAILURE” AND attempt3.result=”SUCCESS” ■ Find node of type login, WHERE .result=”FAILURE”, follow NEXT edges ■ Looking for neighbor of type login WHERE .result=”FAILURE”, follow NEXT ■ Edges looking for neighbor of type login WHERE .result=”SUCCESS”
  • 10. Processing Event Streams Quine ingests data → builds a graph→ persists to pluggable storage →runs live computation on graph to compute results → and then streams them out.
  • 12. Quine Guard Band Test Example ■ A script is used to generate events ■ Quine and DB host failures manually triggered. ■ Pre-loaded Kafka with enough events to sustain one million events/second for two hours. ■ Github repo for reproducible test Component # of hosts Typical Host types Quine Cluster 140 ● c2-standard-30 (30 vCPUs, 120GB RAM) ● Max heap for JVM set to 12GB ● 1 hot spare DB Cluster 66 ● n1-highmem-32 (32 vCPU, 208GB RAM) ● x 375 GB local SSD each ● r1 x 375 GB local SSD each Kafka 3 ● n2-standard-4 (4 vCPU, 16 GB RAM) ● 420 partitions
  • 13. 1 Million Events/Second With Failures #1 initial peak of 1.25M events/sec #2 Quine settles into a steady ingest rate > 1M events/sec #3 Quine recovers nicely after killing single node #4 DB maintenance event exactly 1 hour into test #5 Quine has no problem with two-node failure events. #6 Stopped and resumed a Quine host for about 1 minute to inject high latency #7 Stop and resume a persistor host for about 1 minute to inject high latency #8 Single DB host killed and quickly recovered #9 DB maintenance event exactly 2 hours into test #10 Consumes remaining data from Kafka
  • 14. 21,000 Standing Query Results/Second MATCH (p0)-[:parent]->(p1)-[:parent]->(p2)-[:parent]->(p3) WHERE EXISTS(p0.customer_id) AND EXISTS(p0.sensor_id) AND EXISTS(p0.process_id) AND EXISTS(p0.filename) AND EXISTS(p0.command_line) AND EXISTS(p0.user_id) AND EXISTS(p0.timestamp_unix_utc) AND EXISTS(p0.sha256) RETURN id(p0) AS id
  • 15. 62% Cost Savings! Component Original Hosts @ $130/hr New Hosts @ $50/hr % Difference Quine Cluster 140 x c2-standard-30 120 x n2d-standard-16 54% DB Cluster 66 x n1-highmem-32 40 x n2d-highmem-16 70%
  • 17. Quine + ScyllaDB Scale ~Linearly Together ScyllaDB Cluster Size Quine Cluster Size Ingest Rate Cluster- Wide (e/s) AVG Ingest Rate Per- Host (e/s) 2 8 77,500 9688 20 40 372,500 9300 20 100 882,600 8826 25 100 892,900 8929 Kafka Cluster ■ 10 brokers on n1-standard-16’s, ■ 4200 partitions exist in topic ScyllaDB Cluster ■ Cluster Size: See table ■ Machine Type: n2d-custom-16-131072 ■ Local SSDs: 1x375 GB per host Quine Cluster ■ Cluster Size: See table ■ Machine Type: n2d-standard-16
  • 18. Complex Event Processing Is No Longer Hard ■ Quine + ScyllaDB scales up linearly ■ Easily achieves 1M events/sec and 21000 query results/sec ■ Resilient to infrastructure failures with back pressure ■ Should scale ~linearly to 10+M events/sec