Confluent and Couchbase – Event Streaming Platform + NoSQL combined. This slide deck introduces Apache Kafka as event streaming platform and how to leverage Kafka Connect to integrate with Couchbase.
Sample Best Fit Use Cases
Services requiring low latency, highly available and scalable data ingestion or presentation tier with onward transport of data.
Serving data with high availability to a high multiplicity of readers (up to millions) with deterministic low latency.
Services that wish to transform streaming data and quickly store intermediate state for further processing.
Services storing or processing a high cardinality of entities or with rapid schema evolution.
Services with operational data storage requirements up to the 10s of Terabytes.
Examples of typical applications requiring these functionalities:
Recommendation engines, predictive analytics engines, fraud detection frameworks, risk analytics engines, trader toolkits, real-time trade blotters.
Kafka Connect Couchbase Connector
Stream, filter, and transform events to and from Couchbase with Source and Sink connectors.
Fast, reliable and fault tolerant: Based on DCP (Couchbase replication protocol).
Efficient: Only load new or modified documents.
Real-time: Every mutation to Couchbase generates an event which is published to a Kafka topic.
End-to-End monitoring: Integrated with Confluent Control Center:
Kafka is de-facto standard for data movement
Unified control, monitoring, and metrics
“Config-only”
Apache Kafka and Couchbase => Event Streaming Platform + NoSQL
1. 1
Introduction to Apache Kafka as
Event-Driven Open Source Streaming
Platform
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
… and its integration with Couchbase
2. 2
Vision of an event streaming enterprise
Search
Sensors / IoT
RDBMS Monitoring
NoSQLReal-time Analytics Data Warehouse
Apps
Microservices
Big Data
Streaming Platform
3. 3
Business Digitalization Trends are Driving the Need to Process
Events at a whole new Scale, Speed and Efficiency
The World has Changed
Mobile Cloud Microservices Internet of Things Machine Learning
4. 4
Before: many ad hoc pipelines
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Big Data
App Data
Warehouse
Mainframes NoSQL Relational DB
Databases
Storage
Interfaces
Monitoring App
Databases
Storage
Interfaces
5. 5
After: streaming platform with Kafka
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsMainframes Relational DB
Big Data App Monitoring App
Data
Warehouse
Streaming Platform
NoSQL
14. A Streaming Platform is the Underpinning of an
Event-driven Architecture
Ubiquitous connectivity
Globally scalable platform for all
event producers and
consumers
Immediate data access
Data accessible to all
consumers in real time
Single system of record
Persistent storage to enable
reprocessing of past events
Continuous queries
Stream processing capabilities
for in-line data transformation
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing
apps
Stream processing
apps
Stream processing
apps
15.
16. 16
● Global-scale
● Real-time
● Persistent Storage
● Stream Processing
Apache Kafka: The De-facto Standard for Real-Time Event Streaming
Edge
Cloud
Data LakeDatabases
Datacenter
IoT
SaaS AppsMobile
Microservices Machine
Learning
Apache Kafka
17. Apache Kafka at Scale at Tech
Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
18. Confluents Business Value per Use Case
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital replatforming/
Mainframe Offload
Connected Car: Navigation & improved in-car
experience: Audi
Customer 360
Simplifying Omni-channel Retail at Scale:
Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives; LinkedIn,
Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated environment
(e.g. Electronic Medical Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for Communications
and Beyond: Capital One
Developer Velocity - Building Stateful Financial
Applications with Kafka Streams: Funding
Circle
Detect Fraud & Prevent Fraud in Real Time:
PayPal
Kafka as a Service - A Tale of Security and
Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$↔
31. Kafka Connect is an integration framework on top of Kafka‘s Core
32. Kafka’s Streams API: Build real-time applications for your core business
Kafka’s Streams API
• To build real-time applications for your core business
• Easiest way to process data in Apache Kafka
• Apps are standard Java applications that run on client machines
• Powerful yet easy-to-use library, part of Apache Kafka
• https://github.com/apache/kafka/tree/trunk/streams
Streams
API
Your App
Kafka
Cluster
33. Example: complete app, ready for production at large-scale
Word
Count
App configuration
Define processing
(here:
WordCount)
Start processing
34. 3535
Confluent Delivers a Mission-Critical Event Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Kubernetes Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA
INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME
APPLICATIONS
COMMUNITY FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-
MANAGED
CUSTOMER SELF-MANAGED
35. KSQL – A Streaming SQL Engine for Apache Kafka
36. 3737
Confluent Control Center (C3)
Monitors all pipelines end-to-end
• Lost Messages?
• Duplicates?
• Latency Issues?
• What is the problem?
• Where is the problem?
• Etc.
37.
38. 3939
Best-of-breed Platforms, Partners and Services for Multi-cloud Streams
Private Cloud
Deploy on bare-metal, VMs,
containers or Kubernetes in your
datacenter with Confluent
Platform and Confluent Operator
Public Cloud
Implement self-managed in the public
cloud or adopt a fully managed service
with Confluent Cloud
Hybrid Cloud
Build a persistent bridge between
datacenter and cloud with
Confluent Replicator
Confluent
Replicator
VM
SELF MANAGED FULLY MANAGED
39. 40
Kafka Connect Couchbase Connector
https://github.com/couchbase/kafka-connect-couchbase
https://www.confluent.io/connector/couchbase-db-connector/
Open Source, Developed by Couchbase, Certified by
Confluent
40. 41
Kafka Connect Couchbase Connector
Couchbase cluster
…
Kafka cluster
Kafka Connect
(Connectors to Extract and Load
data)
• Stream, filter, and transform events to and from Couchbase with Source and Sink
connectors.
• Fast, reliable and fault tolerant: Based on DCP (Couchbase replication protocol).
• Efficient: Only load new or modified documents.
• Real-time: Every mutation to Couchbase generates an event which is published to a
Kafka topic.
• End-to-End monitoring: Integrated with Confluent Control Center:
• Kafka is de-facto standard for data movement
• Unified control, monitoring, and metrics
• “Config-only”
41. 42
Confluent and Couchbase - Synergies
• Distributed and fault tolerant
• Horizontally scalable
• Geographically replicated
• Low latency
• Open source
42. 43
KSQLKafka Streams
Event Streaming with Apache Kafka and Couchbase
Splunk Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsMainframes Oracle DB
Hadoop Business App Monitoring App AWS Redshift
Kafka
Couchbase
Kafka Connect
43. 46
Confluent’s Streaming Maturity Model - where are you?
Value
Maturity (Investment &
time)
2
Enterprise
Streaming Pilot /
Early Production
Pub + Sub Store Process
5
Central
Nervous
System
1
Developer
Interest
Pre-Streaming
4
Global
Streaming
3
SLA
Ready,
Integrated
Streaming
Projects
Platform
44. 47
This is just the beginning of a new era… Confluent’s Vision:
Global
Automated disaster recovery
Global applications with geo-awareness
Infinite
Efficient and infinite data with tiered storage
Unlimited horizontal scalability for single clusters
Faster elastic scaling for brokers and partition
Elastic
Easy Container-based orchestration and management
Faster elastic scaling when adding brokers and partitions
Cloud-native Apache Kafka for on-premises, hybrid, multi-cloud