Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB

Build Low Latency, Windowless
Event Processing Pipelines
with Quine and ScyllaDB
Matthew Cullum, Director of Engineering, thatDot
sponsored by

Matthew
■ Director of Engineering @thatDot, makers of Quine
■ 18+ years experience in enterprise software
■ Industrial automation and distributed systems
■ Senior engineer, senior architect, CTO roles
@mattcullum
@brackishman

■ Complex Event Processing
■ Graph Data Structure
■ Design an architecture
■ Quine + ScyllaDB real world performance
■ Questions
Presentation Agenda

Build Low Latency, Windowless Event
Processing Pipelines with Quine and
ScyllaDB
Our Goal:
■ Fits within existing high volume streams
■ Scales linearly to meet any enterprise scale with < 1ms latency
■ Maintains a single stateful graph data structure
■ Performs complex multi-node queries in real-time
■ No time-windows

Complex event processing
with graph data structure

■ High volume of events in one or more streams of data
■ Kafka, Kinesis, Pulsar, etc.
■ Infer complex relationships between the events
■ Use Cases:
■ Fraud Detection
■ Network Management
■ Advanced Persistent Threat Detection
■ XDR/EDR
■ Monitoring State Change (CDC)
Complex Event Processing

Graph Data Structure
■ Nodes with properties, connected by edges
■ Categorical data (a lot of rich information gets encoded or discarded)
■ No costly joins

■ Current graph databases can’t
■ Process at event streaming scale (>1M+ events/second) while….
■ Completing multi-node traversals (complex queries) < 1ms
■ What we want to accomplish
■ Event processing performance - 1M+ events/sec ingest
■ Match a relatively rare (2%) 4-node complex pattern in real-time
■ Resilient in face of infrastructure/network failures
■ Infrastructure cost effective
Scaling Graph for Production Use Cases

■ Start with a database that already has similar characteristics
■ Design a graph data structure over a key-value store
■ Translate graph queries into per-node queries
What-If Architecture
MATCH (attempt1:login)-[:NEXT]->(attempt2:login)-[:NEXT]->(attempt3:login) WHERE
attempt1.result=”FAILURE”
AND attempt2.result=”FAILURE”
AND attempt3.result=”SUCCESS”
■ Find node of type login, WHERE .result=”FAILURE”, follow NEXT edges
■ Looking for neighbor of type login WHERE .result=”FAILURE”, follow NEXT
■ Edges looking for neighbor of type login WHERE .result=”SUCCESS”

Processing Event Streams
Quine ingests data → builds a graph→ persists to pluggable storage →runs
live computation on graph to compute results → and then streams them out.

Quine Guard Band Test Example
■ A script is used to generate events
■ Quine and DB host failures manually
triggered.
■ Pre-loaded Kafka with enough events to
sustain one million events/second for
two hours.
■ Github repo for reproducible test
Component
# of
hosts
Typical Host types
Quine Cluster 140 ● c2-standard-30 (30 vCPUs, 120GB RAM)
● Max heap for JVM set to 12GB
● 1 hot spare
DB Cluster 66 ● n1-highmem-32 (32 vCPU, 208GB RAM)
● x 375 GB local SSD each
● r1 x 375 GB local SSD each
Kafka 3 ● n2-standard-4 (4 vCPU, 16 GB RAM)
● 420 partitions

1 Million Events/Second With Failures
#1 initial peak of 1.25M events/sec
#2 Quine settles into a steady ingest rate > 1M events/sec
#3 Quine recovers nicely after killing single node
#4 DB maintenance event exactly 1 hour into test
#5 Quine has no problem with two-node failure events.
#6 Stopped and resumed a Quine host for about 1 minute
to inject high latency
#7 Stop and resume a persistor host for about 1 minute to
inject high latency
#8 Single DB host killed and quickly recovered
#9 DB maintenance event exactly 2 hours into test
#10 Consumes remaining data from Kafka

21,000 Standing Query Results/Second
MATCH (p0)-[:parent]->(p1)-[:parent]->(p2)-[:parent]->(p3)
WHERE
EXISTS(p0.customer_id) AND
EXISTS(p0.sensor_id) AND
EXISTS(p0.process_id) AND
EXISTS(p0.filename) AND
EXISTS(p0.command_line) AND
EXISTS(p0.user_id) AND
EXISTS(p0.timestamp_unix_utc) AND
EXISTS(p0.sha256)
RETURN
id(p0) AS id

62% Cost Savings!
Component
Original Hosts @
$130/hr
New Hosts @
$50/hr
% Difference
Quine
Cluster
140 x
c2-standard-30
120 x
n2d-standard-16
54%
DB Cluster 66 x
n1-highmem-32
40 x
n2d-highmem-16
70%

Same Results, Significantly Reduced Cost

Quine + ScyllaDB Scale ~Linearly Together
ScyllaDB
Cluster Size
Quine
Cluster Size
Ingest Rate
Cluster-
Wide (e/s)
AVG Ingest
Rate Per-
Host (e/s)
2 8 77,500 9688
20 40 372,500 9300
20 100 882,600 8826
25 100 892,900 8929
Kafka Cluster
■ 10 brokers on n1-standard-16’s,
■ 4200 partitions exist in topic
ScyllaDB Cluster
■ Cluster Size: See table
■ Machine Type: n2d-custom-16-131072
■ Local SSDs: 1x375 GB per host
Quine Cluster
■ Cluster Size: See table
■ Machine Type: n2d-standard-16

Complex Event Processing Is No Longer Hard
■ Quine + ScyllaDB scales up linearly
■ Easily achieves 1M events/sec
and 21000 query results/sec
■ Resilient to infrastructure failures
with back pressure
■ Should scale ~linearly to 10+M events/sec

Thank You
Try Quine:
docs.quine.io/getting-started
matt@thatdot.com
@thatdot
that.re/quine-slack

Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB

Semelhante a Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB (20)

Mais de ScyllaDB

Mais de ScyllaDB (20)

Último

Último (20)

Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB