SlideShare a Scribd company logo
1 of 53
Confidential
Capital One Delivers Risk Insights in
Real Time with Stream Processing
Jeff Sharpe and Ravi Dubey
Capital One Retail Bank
Confluent Online Talk
May 30, 2018
2
Ravi is a senior manager working for Capital One in Virginia. Ravi
has over 25 years of software development and management
experience across a range of products in support of government
and commercial industries. His most recent experience includes
full stack development of web apps, cloud-based enterprise-facing
support applications and a high-throughput, low-latency,
distributed cloud-hosted data processing platform.
Ravi Dubey
Senior Manager, Software Engineering, Capital One
Jeff is a senior software engineer working for Capital One in
Virginia. He’s been an engineer for almost 18 years, with major
projects spanning five different languages. Though he began his
work on kernel drivers and web applications, he’s been repeatedly
drawn into high volume, high throughput data processing
projects.
Jeff Sharpe
Senior Software Engineer, Capital One
3
Housekeeping Items
● This session will last about an hour.
● This session will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
Thanks…
• Bobby Calderwood
– @bobbycalderwood
– https://www.confluent.io/blog/author/bobby/
• Keith Gasser
– Keith.Gasser@capitalone.com
Real Time Decisioning Platform - Introduction
• Decisioning using ML models and Rules using low-latency
processing
• Streamed, batched, or micro-batched messages
Real Time Decisioning Platform - Introduction
Streaming “Window”
RT Decisioning Platform - Introduction
• High Speed Durable Message Bus – Apache Kafka
• Enterprise Data Sources – Streams, Databases, and
Warehouses
• ETL – Apache NiFi, Kafka Connect, Confluent Schema Registry
• Distributed Processing – Apache Flink and others
• Feature Caching – Apache Flink, Redis, Kafka Compacted
Topics
• Prometheus, Grafana – Metrics, Alert Management
• Supplemented with Cloud compute, RDBMS, and Caching
services
• Containerization – Docker and Kubernetes
RT Decisioning Platform - Kafka Messaging
• Durable, fast, and clustered Kafka topics act as data streams
regarding decisioning input and decision scoring output
• DataStream window intervals correlate to Kafka Topic
log.retention.ms, typically between 30 and 180+ days
• DataStream objects are aggregated into cached features,
such as average daily balance for a specific account holder
• Ten brokers in total per AWS region, dozens of topics
• Producers include NiFi, Kafka Connect, external Streams
Producer-Maintained
Transaction IDs
(Can arrive out of order)
Producer
(Data Source)
22 21 19 20 18 17 16 15 14 13 12 11 9 10 8 5 7 6 4 3 2 124
Kafka
Topic
+ Payload
RT Decisioning Platform - Kafka Messaging
Producer-Maintained
Transaction IDs
(Can arrive out of order)
Producer
(Data Source)
22 21 19 20 18 17 16 15 14 13 12 11 9 10 8 5 7 6 4 3 2 124
Kafka
Topic
+ Payload
Apache Flink 20
19
18
…
7
6
5
4
3
DataStream
Structure
(sorts,
Aggregates, etc.)
Kafka
Compacted Topics
RT Decisioning Platform - Kafka Messaging
Independent
Model
Consumer
Rules
Consumer
12 11 9 10 8 5 7 6 4 3 2 1
• Independent and Interdependent Decisioning
Patterns, Kafka decouples models and rules
Source Topic
8 5 7 6 4 3 2 1
Downstream Topics Support Dependent Scoring
2 110
Dependent
Model
Consumer
5
+ Payload
+ Model Score
+ Payload
+ Rules Score
3
+ Payload
+ Rules Score
10
+ Payload
13
+ Payload
5
+ Payload
3
+ Payload
+ Rules Score
+ Model Score
Producer-Defined ID
Enterprise Compliance: Image Rehydration
• Cloud VM Machine Images require periodic update
• RT Platform stack has 100+ distinct containers – underlying
image rehydration best handled with an abstraction layer
• Simple Blue-Green approaches can work for stateless
components, BUT…
• Network Storage and other Disk Volumes add complexity for
stateful components such as Kafka Brokers
• Kafka Clustering provides fault tolerance and failover during
rehydration, though we needed a solution to manage Kafka
logs mounted on Cloud Storage
Storage mount points broken
during instance recreation
Kubernetes
• Kubernetes (k8s) is OSS that manages container lifecycle,
addressing, and networking among other things
• Scheduler “moves” both Pods and associated storage volumes
defined in Stateful Sets in coordination between VM nodes
enabling clean rolling rehydration of Kafka Brokers
• Services allow Kafka Brokers and Kafka Connect to be accessed
by a logical service name by all platform components.
• Software Networking enables single TLS solution between all
components, common DNS, and integrated cloud Load
Balancing
• For external access to Kafka on the RT Platform, we recycle
external DNS mapping IP to common name at configurable
intervals (20 sec)
Kafka Considerations – Cluster 1
• RT Platform hosts all containers on instance types… 150GB RAM,
40 Cores, 10GB network performance. Good for most stack
components
– Instance Node affinity set so max one Kafka broker and max one ZK node.
– Shared ZooKeeper cluster with other RT Platform components
– In AWS, st1 EBS volume types optimized for write throughput, optimized
for Kafka
• Brokers increase demand on instance and platform shared
resources
– Platform Zookeeper state
– Instance OS open files
– Instance RAM
– Instance Network Access
– Instance Storage IO
• Kafka Brokers utilize RAM including Java heap and page cache
correlating to the size of topics.
• Replication Factor of 3 means four times the disk space consumed
Kafka Considerations – Cluster 1
Deeper Topics = More Disk Space
More Page Cache RAM
Kubernetes Pod Memory Usage
EC2 Node Memory Usage
Kafka Considerations – Cluster 1
C Kafka
C
C
C
C
C
C
Z
C
C C C
C C CC
C
CC
Kafka
C
Z
C
C
C
C
C
C
C
C
C
Larger (m4.10xlarge , n1-standard-32 , n1-highmem-32)
instance/machine types: Faster network speeds, 100+ GB of RAM,
30+ cores, noisier neighbors competing for RAM, Network IO, “Blast
Radius”
TLS IOIO
Kafka Considerations – Cluster 1
Smaller instance/machine types (m4.2xlarge , n1-highmem-4 ,
standard-8), dedicated ZK, single broker node affinity, Connect, and or
Schema Registry. Tradeoff: risk, predictability, simplicity vs. faster
networking network and high-end CPU
Kafka
C
C
C
C
C
C
C
Z
C
C C C
C C CC
C
CC
CC
CC
Z KCSR
KC
Kafka
Z KCSR
Kafka
Z KC
KC
Kafka
Z
KC
KC
+
Kafka Considerations – Cluster 2
Kafka Real-Time Upgrades
• RT Platform supports multiple active tenants, so
uniform downtime during version upgrades is not
usually an option.
• Rolling upgrades potentially pose compatibility risks
between Kafka versions.
Kafka Real-Time Upgrades
1- Green Cluster provisioned and Topic Offsets
captured
12 11 9 10 8 5 7 6 4 3 2 1
Producer
Kafka1Svc
Capture Each
Topic Offset
Kafka Real-Time Upgrades
2- Tooling Backfills new Topics
• Depending on desired window size, tooling may be used to
backfill data for topics on new clusters, respecting time
stamp for consistent retention policy.
• Possible Candidate Process for Mirroring
13 12 11 9 10 8 5 7 6 4 3
12 11 9 10 8 5 7 6 4 3 2 1
Backfill Tooling,
Possible Mirroring
Producer
Kafka1Svc
Kafka Real-Time Upgrades
3- Producer flows set to load second Kafka cluster as
required
• Producers reference newly upgraded Kafka Clusters by
new k8s service name and upgrade to new cluster
independently
14 13 12 11 9 10 8 5 7 6
14 14 13 12 11 9 10 8 5 7
Producer
Kafka2Svc
Kafka1Svc
Kafka Real-Time Upgrades- Consequences
• Overlaps between 2) and 3) likely to create
duplicates (better than gaps)
• If downstream state based on original cluster or
original offsets are not preserved, all messages in
window may need to be replayed to recover
14 13 12 11 9 10 8 5 7 6
14 14 13 12 11 9 10 8 5 7
Producer
Kafka Across Regions
• Regional Clusters
• Why Do This?
– Partitioned Strategy
• Active-Active
• Latency or Partition Routed, Increased
Performance and Efficiency
– Disaster Recovery
• Active-Passive, Active-Active
• Redundantly Constructed and Routed,
Increased Reliability
• Issues
– Syncing Data
– Latency
• Inefficient Operation Across Great
Distance
• Kafka Cluster Replication not
recommended
Kafka Across Regions – Data Syncing Options
• Duplicate Common Upstream Sources
• Producer-Driven Replication
• Mirroring
• Mirroring + Consolidation
Kafka Across Regions – Data Syncing Options
Common Upstream
• Local Producers use Common Source
• 2 Topics Represent 1 Logical Topic
• Pros
• Fewest Number of Topics
• Consumer behavior minimally impacted
• Cons
• Each Local Producer needs to know about Each Regional
Deployment
Kafka Across Regions – Data Syncing Options
Producer
Region BRegion A
Producer
Topi
c
Topi
c
2 Topics Represent 1 Logical set of Messages
Consumers Consumers
Common Upstream
ETL Pull
Kafka Across Regions – Data Syncing Options
Producer-Driven Replication
• Producers maintain Topic consistency across multiple
regions
• 2 Topics Represent 1 Logical Topic, Clusters
• Pros
• Fewest Number of Topics
• Consumer behavior minimally impacted
• Cons
• Each Producer needs to know about Each Regional
Deployment
• Failure strategy, Reliability Tracking, SLA, etc. must be
Implemented by each Producer– likely using shadow topics
Kafka Across Regions – Data Syncing Options
Producer
Region BRegion A
Producer
Topic AB Topic
BA
2 Topics Represent 1 Logical set of Messages
Consumers Consumers
Producer-Driven Replication
A Routed Data B Routed Data
Shadow TopicShadow Topic
Kafka Across Regions – Data Syncing Options
Mirroring
• Tooling Automatically Replicates Topics
• Confluent Replicator (Licensed)
• Mirror Maker, uReplicator (OSS)
• 4 Topics Represent 1 Logical Topic
• Pros
• Producer behavior minimally impacted
• Cons
• Each Consumer needs to know about Each Replicated
Topic
• Complexity–More Topics
Kafka Across Regions – Data Syncing Options
Producer
Region BRegion A
Producer
Topic A Topic
B
4 Topics Represent 1 Logical set of Messages
Consumers Consumers
Mirroring
Topic
B’
Topic
A’
Mirror
Kafka Across Regions – Data Syncing Options
Mirroring + Consolidation
• Tooling Automatically Replicates Topics
• Additional Tooling merges Topics for Consumers
• ETL Tooling, NiFi, etc.
• Kafka Connect
• 6 Topics Represent 1 Logical Topic
• Pros
• Producer behavior minimally impacted
• Consumer behavior minimally impacted
• Cons
• Custom tooling must implement failure strategy, reliability
tracking, etc.
• Complexity– Lots More Topics, flow logic, and associated
resource consumption
Kafka Across Regions – Data Syncing Options
Producer
Region BRegion A
Producer
Topic A Topic
B
6 Topics Represent 1 Logical set of Messages
Mirroring + Consolidation
Topic
B’
Topic
A’
Consumers
Topic AB
Consumers
Topic
BA
ETL ETL
Mirror
Kafka Across Regions – Data Syncing Options
• Multiple Tenant Use Cases and Risk
Tolerances
• Combination of Solutions
– Common Upstream
– Confluent Replication
So What Do We Use?
Kafka – Moving Forward
• Exactly Once Semantics/Transactionality
• Hyper Partitioning
• Alternate Backends to Support Indefinite
Retention (S3, etc.)
Kafka for Real Time Bank Decisions
Handling Private Information
Real-Time Request and Response
Handling PII (not) on Kafka
Goal:
Remove the possibility of exposing PII
Encrypted Volume: Simple & Effective
Library Card#
8675309
Library Card#
TOK:113581321
KAFKA
Storage
Encryption
Tokenizatio
n
Consumer
Consumer
ConsumerTopic
Persistence
Producer
Encrypted Volume:
Following the Path of Least Resistance
Good
• Highly durable across Kafka
restarts
• Simple disaster recovery
planning
• Follows recommended
Kafka configuration
practices
Not So Good
• Information privacy
regulations require extra
levels of protection
• Durability is based on
additional storage volumes
being managed with the
Kafka service
Volatile Storage: Performance & Privacy
KAFKA
Consumer
Consumer
ConsumerTopic
Persistence
Initial
State
Storage
tmpfs
Storage
Copy on Startup
Library Card#
8675309
Library Card#
TOK:113581321
Tokenizatio
n
Producer
Volatile Storage: Strange Trade-offs
Improvements
• Noticeably better
performance
• Data is always “in flight”, so
extra encryption shouldn’t
be needed
• Effectively stateless images
Complications
• Needs scripting to bootstrap
• Topic contents are cleared
on host reboot
• Zookeeper won’t be able to
manage offsets between
reboots
Volatile Storage: Why We Aren’t Using It
• We need long-term storage of data and RAM is already a
precious resource.
• Our recovery strategy is built on Kafka as our state storage
mechanism. Losing that state complicates recovery efforts.
• Host disk caching gives us most of the benefit of volatile
storage.
Request-Response Pattern
/rəˈkwest rəˈspans ˈpadərn/
noun
1. A pattern of interaction with a remote service where the
local task submits a request for remote work and
expects a response before continuing work.
2. A specialized use of Kafka using dedicated topic pairs
to communicate with a shared service
Request Response Basics
Application
Request Topic
Response Topic
3. Prepare DataData
4. Assign a unique ID
5. Put request on
request topic
Service
(Service does work,
and builds a response
with the Request ID)
6. Read Response topic
until Request ID is seen
2. Initialize Producer
1. Initialize ConsumerConsumer
Producer
ID: 14159-26535
ID:14159-26532
ID:14159-26531
ID:14159-26533
ID:14159-26535
ID:14159-26536
How Request-Response Feels
Application
Service
Data
How Request-Response Actually Works
Data
Data
Data
Application
Request Topic
Data Data Data Data Data
Data
Response Topic
Data Data Data Data
When Failures Occur
Data
Data
Data
Application
Request Topic
Data Data Data Data Data
Data
Response Topic
Data Data
Data
Data
Data
Data
Data
Missing
Responses
Data
Data
The Slow Failure Problem
Failures
The Request-Response Pattern
This is actually the
“Background Job” pattern:
1. Submit Job
2. Get assigned a Job ID
3. Poll for the service for until the Job ID is
marked as complete
4. Retrieve the results of the job
Request Response: Serverless Considerations
• Try to reuse Producers and Consumers
• Explicitly assign Consumer partitions
• Attempt to read from the Consumer before
submitting to the Producer
• Remember to commit offsets before sending
responses
REST
GRPC
ETC
Slightly Better: The Real-Time Tap Pattern
Input Topic
Precomputed
Values
Processing
Service
Application
Request
Real Time
Service
Read Request
Process
Send Response
Deliver Data
Request
Response Response
Real-Time Tap Pattern
• Real-time request is handled by a session-based
protocol
• Resilient data processing is handled by Kafka
• Failures are reported when they happen via
real-time protocol
• Kafka interactions can be optimized by the handler
service, rather than relying on clients
52
Questions?
53
Thank you for joining us!

More Related Content

What's hot

SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...confluent
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化confluent
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaKai Wähner
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 

What's hot (20)

kafka
kafkakafka
kafka
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 

Similar to Capital One Delivers Risk Insights in Real Time with Stream Processing

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 

Similar to Capital One Delivers Risk Insights in Real Time with Stream Processing (20)

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Kafka talk
Kafka talkKafka talk
Kafka talk
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Capital One Delivers Risk Insights in Real Time with Stream Processing

  • 1. Confidential Capital One Delivers Risk Insights in Real Time with Stream Processing Jeff Sharpe and Ravi Dubey Capital One Retail Bank Confluent Online Talk May 30, 2018
  • 2. 2 Ravi is a senior manager working for Capital One in Virginia. Ravi has over 25 years of software development and management experience across a range of products in support of government and commercial industries. His most recent experience includes full stack development of web apps, cloud-based enterprise-facing support applications and a high-throughput, low-latency, distributed cloud-hosted data processing platform. Ravi Dubey Senior Manager, Software Engineering, Capital One Jeff is a senior software engineer working for Capital One in Virginia. He’s been an engineer for almost 18 years, with major projects spanning five different languages. Though he began his work on kernel drivers and web applications, he’s been repeatedly drawn into high volume, high throughput data processing projects. Jeff Sharpe Senior Software Engineer, Capital One
  • 3. 3 Housekeeping Items ● This session will last about an hour. ● This session will be recorded. ● You can submit your questions by entering them into the GoToWebinar panel. ● The last 10-15 minutes will consist of Q&A. ● The slides and recording will be available after the talk.
  • 4. Thanks… • Bobby Calderwood – @bobbycalderwood – https://www.confluent.io/blog/author/bobby/ • Keith Gasser – Keith.Gasser@capitalone.com
  • 5. Real Time Decisioning Platform - Introduction • Decisioning using ML models and Rules using low-latency processing • Streamed, batched, or micro-batched messages
  • 6. Real Time Decisioning Platform - Introduction Streaming “Window”
  • 7. RT Decisioning Platform - Introduction • High Speed Durable Message Bus – Apache Kafka • Enterprise Data Sources – Streams, Databases, and Warehouses • ETL – Apache NiFi, Kafka Connect, Confluent Schema Registry • Distributed Processing – Apache Flink and others • Feature Caching – Apache Flink, Redis, Kafka Compacted Topics • Prometheus, Grafana – Metrics, Alert Management • Supplemented with Cloud compute, RDBMS, and Caching services • Containerization – Docker and Kubernetes
  • 8. RT Decisioning Platform - Kafka Messaging • Durable, fast, and clustered Kafka topics act as data streams regarding decisioning input and decision scoring output • DataStream window intervals correlate to Kafka Topic log.retention.ms, typically between 30 and 180+ days • DataStream objects are aggregated into cached features, such as average daily balance for a specific account holder • Ten brokers in total per AWS region, dozens of topics • Producers include NiFi, Kafka Connect, external Streams Producer-Maintained Transaction IDs (Can arrive out of order) Producer (Data Source) 22 21 19 20 18 17 16 15 14 13 12 11 9 10 8 5 7 6 4 3 2 124 Kafka Topic + Payload
  • 9. RT Decisioning Platform - Kafka Messaging Producer-Maintained Transaction IDs (Can arrive out of order) Producer (Data Source) 22 21 19 20 18 17 16 15 14 13 12 11 9 10 8 5 7 6 4 3 2 124 Kafka Topic + Payload Apache Flink 20 19 18 … 7 6 5 4 3 DataStream Structure (sorts, Aggregates, etc.) Kafka Compacted Topics
  • 10. RT Decisioning Platform - Kafka Messaging Independent Model Consumer Rules Consumer 12 11 9 10 8 5 7 6 4 3 2 1 • Independent and Interdependent Decisioning Patterns, Kafka decouples models and rules Source Topic 8 5 7 6 4 3 2 1 Downstream Topics Support Dependent Scoring 2 110 Dependent Model Consumer 5 + Payload + Model Score + Payload + Rules Score 3 + Payload + Rules Score 10 + Payload 13 + Payload 5 + Payload 3 + Payload + Rules Score + Model Score Producer-Defined ID
  • 11. Enterprise Compliance: Image Rehydration • Cloud VM Machine Images require periodic update • RT Platform stack has 100+ distinct containers – underlying image rehydration best handled with an abstraction layer • Simple Blue-Green approaches can work for stateless components, BUT… • Network Storage and other Disk Volumes add complexity for stateful components such as Kafka Brokers • Kafka Clustering provides fault tolerance and failover during rehydration, though we needed a solution to manage Kafka logs mounted on Cloud Storage Storage mount points broken during instance recreation
  • 12. Kubernetes • Kubernetes (k8s) is OSS that manages container lifecycle, addressing, and networking among other things • Scheduler “moves” both Pods and associated storage volumes defined in Stateful Sets in coordination between VM nodes enabling clean rolling rehydration of Kafka Brokers • Services allow Kafka Brokers and Kafka Connect to be accessed by a logical service name by all platform components. • Software Networking enables single TLS solution between all components, common DNS, and integrated cloud Load Balancing • For external access to Kafka on the RT Platform, we recycle external DNS mapping IP to common name at configurable intervals (20 sec)
  • 13. Kafka Considerations – Cluster 1 • RT Platform hosts all containers on instance types… 150GB RAM, 40 Cores, 10GB network performance. Good for most stack components – Instance Node affinity set so max one Kafka broker and max one ZK node. – Shared ZooKeeper cluster with other RT Platform components – In AWS, st1 EBS volume types optimized for write throughput, optimized for Kafka • Brokers increase demand on instance and platform shared resources – Platform Zookeeper state – Instance OS open files – Instance RAM – Instance Network Access – Instance Storage IO
  • 14. • Kafka Brokers utilize RAM including Java heap and page cache correlating to the size of topics. • Replication Factor of 3 means four times the disk space consumed Kafka Considerations – Cluster 1 Deeper Topics = More Disk Space More Page Cache RAM
  • 15. Kubernetes Pod Memory Usage EC2 Node Memory Usage Kafka Considerations – Cluster 1
  • 16. C Kafka C C C C C C Z C C C C C C CC C CC Kafka C Z C C C C C C C C C Larger (m4.10xlarge , n1-standard-32 , n1-highmem-32) instance/machine types: Faster network speeds, 100+ GB of RAM, 30+ cores, noisier neighbors competing for RAM, Network IO, “Blast Radius” TLS IOIO Kafka Considerations – Cluster 1
  • 17. Smaller instance/machine types (m4.2xlarge , n1-highmem-4 , standard-8), dedicated ZK, single broker node affinity, Connect, and or Schema Registry. Tradeoff: risk, predictability, simplicity vs. faster networking network and high-end CPU Kafka C C C C C C C Z C C C C C C CC C CC CC CC Z KCSR KC Kafka Z KCSR Kafka Z KC KC Kafka Z KC KC + Kafka Considerations – Cluster 2
  • 18. Kafka Real-Time Upgrades • RT Platform supports multiple active tenants, so uniform downtime during version upgrades is not usually an option. • Rolling upgrades potentially pose compatibility risks between Kafka versions.
  • 19. Kafka Real-Time Upgrades 1- Green Cluster provisioned and Topic Offsets captured 12 11 9 10 8 5 7 6 4 3 2 1 Producer Kafka1Svc Capture Each Topic Offset
  • 20. Kafka Real-Time Upgrades 2- Tooling Backfills new Topics • Depending on desired window size, tooling may be used to backfill data for topics on new clusters, respecting time stamp for consistent retention policy. • Possible Candidate Process for Mirroring 13 12 11 9 10 8 5 7 6 4 3 12 11 9 10 8 5 7 6 4 3 2 1 Backfill Tooling, Possible Mirroring Producer Kafka1Svc
  • 21. Kafka Real-Time Upgrades 3- Producer flows set to load second Kafka cluster as required • Producers reference newly upgraded Kafka Clusters by new k8s service name and upgrade to new cluster independently 14 13 12 11 9 10 8 5 7 6 14 14 13 12 11 9 10 8 5 7 Producer Kafka2Svc Kafka1Svc
  • 22. Kafka Real-Time Upgrades- Consequences • Overlaps between 2) and 3) likely to create duplicates (better than gaps) • If downstream state based on original cluster or original offsets are not preserved, all messages in window may need to be replayed to recover 14 13 12 11 9 10 8 5 7 6 14 14 13 12 11 9 10 8 5 7 Producer
  • 23. Kafka Across Regions • Regional Clusters • Why Do This? – Partitioned Strategy • Active-Active • Latency or Partition Routed, Increased Performance and Efficiency – Disaster Recovery • Active-Passive, Active-Active • Redundantly Constructed and Routed, Increased Reliability • Issues – Syncing Data – Latency • Inefficient Operation Across Great Distance • Kafka Cluster Replication not recommended
  • 24. Kafka Across Regions – Data Syncing Options • Duplicate Common Upstream Sources • Producer-Driven Replication • Mirroring • Mirroring + Consolidation
  • 25. Kafka Across Regions – Data Syncing Options Common Upstream • Local Producers use Common Source • 2 Topics Represent 1 Logical Topic • Pros • Fewest Number of Topics • Consumer behavior minimally impacted • Cons • Each Local Producer needs to know about Each Regional Deployment
  • 26. Kafka Across Regions – Data Syncing Options Producer Region BRegion A Producer Topi c Topi c 2 Topics Represent 1 Logical set of Messages Consumers Consumers Common Upstream ETL Pull
  • 27. Kafka Across Regions – Data Syncing Options Producer-Driven Replication • Producers maintain Topic consistency across multiple regions • 2 Topics Represent 1 Logical Topic, Clusters • Pros • Fewest Number of Topics • Consumer behavior minimally impacted • Cons • Each Producer needs to know about Each Regional Deployment • Failure strategy, Reliability Tracking, SLA, etc. must be Implemented by each Producer– likely using shadow topics
  • 28. Kafka Across Regions – Data Syncing Options Producer Region BRegion A Producer Topic AB Topic BA 2 Topics Represent 1 Logical set of Messages Consumers Consumers Producer-Driven Replication A Routed Data B Routed Data Shadow TopicShadow Topic
  • 29. Kafka Across Regions – Data Syncing Options Mirroring • Tooling Automatically Replicates Topics • Confluent Replicator (Licensed) • Mirror Maker, uReplicator (OSS) • 4 Topics Represent 1 Logical Topic • Pros • Producer behavior minimally impacted • Cons • Each Consumer needs to know about Each Replicated Topic • Complexity–More Topics
  • 30. Kafka Across Regions – Data Syncing Options Producer Region BRegion A Producer Topic A Topic B 4 Topics Represent 1 Logical set of Messages Consumers Consumers Mirroring Topic B’ Topic A’ Mirror
  • 31. Kafka Across Regions – Data Syncing Options Mirroring + Consolidation • Tooling Automatically Replicates Topics • Additional Tooling merges Topics for Consumers • ETL Tooling, NiFi, etc. • Kafka Connect • 6 Topics Represent 1 Logical Topic • Pros • Producer behavior minimally impacted • Consumer behavior minimally impacted • Cons • Custom tooling must implement failure strategy, reliability tracking, etc. • Complexity– Lots More Topics, flow logic, and associated resource consumption
  • 32. Kafka Across Regions – Data Syncing Options Producer Region BRegion A Producer Topic A Topic B 6 Topics Represent 1 Logical set of Messages Mirroring + Consolidation Topic B’ Topic A’ Consumers Topic AB Consumers Topic BA ETL ETL Mirror
  • 33. Kafka Across Regions – Data Syncing Options • Multiple Tenant Use Cases and Risk Tolerances • Combination of Solutions – Common Upstream – Confluent Replication So What Do We Use?
  • 34. Kafka – Moving Forward • Exactly Once Semantics/Transactionality • Hyper Partitioning • Alternate Backends to Support Indefinite Retention (S3, etc.)
  • 35. Kafka for Real Time Bank Decisions Handling Private Information Real-Time Request and Response
  • 36. Handling PII (not) on Kafka Goal: Remove the possibility of exposing PII
  • 37. Encrypted Volume: Simple & Effective Library Card# 8675309 Library Card# TOK:113581321 KAFKA Storage Encryption Tokenizatio n Consumer Consumer ConsumerTopic Persistence Producer
  • 38. Encrypted Volume: Following the Path of Least Resistance Good • Highly durable across Kafka restarts • Simple disaster recovery planning • Follows recommended Kafka configuration practices Not So Good • Information privacy regulations require extra levels of protection • Durability is based on additional storage volumes being managed with the Kafka service
  • 39. Volatile Storage: Performance & Privacy KAFKA Consumer Consumer ConsumerTopic Persistence Initial State Storage tmpfs Storage Copy on Startup Library Card# 8675309 Library Card# TOK:113581321 Tokenizatio n Producer
  • 40. Volatile Storage: Strange Trade-offs Improvements • Noticeably better performance • Data is always “in flight”, so extra encryption shouldn’t be needed • Effectively stateless images Complications • Needs scripting to bootstrap • Topic contents are cleared on host reboot • Zookeeper won’t be able to manage offsets between reboots
  • 41. Volatile Storage: Why We Aren’t Using It • We need long-term storage of data and RAM is already a precious resource. • Our recovery strategy is built on Kafka as our state storage mechanism. Losing that state complicates recovery efforts. • Host disk caching gives us most of the benefit of volatile storage.
  • 42. Request-Response Pattern /rəˈkwest rəˈspans ˈpadərn/ noun 1. A pattern of interaction with a remote service where the local task submits a request for remote work and expects a response before continuing work. 2. A specialized use of Kafka using dedicated topic pairs to communicate with a shared service
  • 43. Request Response Basics Application Request Topic Response Topic 3. Prepare DataData 4. Assign a unique ID 5. Put request on request topic Service (Service does work, and builds a response with the Request ID) 6. Read Response topic until Request ID is seen 2. Initialize Producer 1. Initialize ConsumerConsumer Producer ID: 14159-26535 ID:14159-26532 ID:14159-26531 ID:14159-26533 ID:14159-26535 ID:14159-26536
  • 45. How Request-Response Actually Works Data Data Data Application Request Topic Data Data Data Data Data Data Response Topic Data Data Data Data
  • 46. When Failures Occur Data Data Data Application Request Topic Data Data Data Data Data Data Response Topic Data Data Data Data Data Data Data Missing Responses Data Data
  • 47. The Slow Failure Problem Failures
  • 48. The Request-Response Pattern This is actually the “Background Job” pattern: 1. Submit Job 2. Get assigned a Job ID 3. Poll for the service for until the Job ID is marked as complete 4. Retrieve the results of the job
  • 49. Request Response: Serverless Considerations • Try to reuse Producers and Consumers • Explicitly assign Consumer partitions • Attempt to read from the Consumer before submitting to the Producer • Remember to commit offsets before sending responses
  • 50. REST GRPC ETC Slightly Better: The Real-Time Tap Pattern Input Topic Precomputed Values Processing Service Application Request Real Time Service Read Request Process Send Response Deliver Data Request Response Response
  • 51. Real-Time Tap Pattern • Real-time request is handled by a session-based protocol • Resilient data processing is handled by Kafka • Failures are reported when they happen via real-time protocol • Kafka interactions can be optimized by the handler service, rather than relying on clients
  • 53. 53 Thank you for joining us!