Presenter: Guru Sattanathan, Systems Engineer, Confluent
Event-driven architectures have been around for many years, much like Apache Kafka®, which first open sourced in 2011. The reality is that the true potential of Kafka is only being realised now. Kafka is becoming the central nervous system of many of today’s enterprises. It is bringing a profound paradigm shift to the way we think about enterprise IT. What has changed in Kafka to enable this paradigm shift? Is it not just a message broker, and how are enterprises using it today? This session will explore these key questions.
Sydney: https://content.deloitte.com.au/20200221-tel-event-tech-community-syd-registration
Melbourne: https://content.deloitte.com.au/20200221-tel-event-tech-community-mel-registration
9. TWO WORLDS OF ENTERPRISE IT
9
THINGS THAT
ARE
OPERATIONALLY
CRITICAL
THINGS THAT
ARE NOT
10. TWO WORLDS OF ENTERPRISE IT
10
THINGS THAT
HELP RUN THE
BUSINESS
THINGS THAT
HELP MAKE
MORE $$$
11. TWO WORLDS OF ENTERPRISE IT
11
World of
Enterprise Apps
World of
Data & Analytics
12. TWO WORLDS OF ENTERPRISE IT
12
Microservices,
Enterprise Integration,
Databases, Kubernetes,
CICD,
API’s, Websites, etc.
Data warehouse,
Analytics,
Machine Learning,
Neural Nets,
BI Dashboards, SQL Nerds,
Hadoop,
Spark, etc.
13. TWO WORLDS OF ENTERPRISE IT
13
Heavy focus on
Uptime
Heavy focus on
Data computation
14. TWO WORLDS OF ENTERPRISE IT
14
Data Acquisition Data Processing
Data storage wall
15. TWO WORLDS OF ENTERPRISE IT
15
You were able to process data only after storing it in a database or a
data lake
OR
You have to wait for data to accumulate before you start processing it
Data storage wall
30. A Streaming Platform is the Underpinning
of an Event-driven Architecture
Ubiquitous connectivity
Globally scalable platform for all
event producers and consumers
Immediate data access
Data accessible to all
consumers in real time
Single system of record
Persistent storage to enable
reprocessing of past events
Continuous queries
Stream processing capabilities
for in-line data transformation
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps Apps from both the worlds
31. {faas}
events as a backbone
appappappapp
Payments Department 2
{faas}appappappapp
Department 3 Department 4
32. 1 input, 1 output.
Low latency,
Poor throughput.
Request/
Response
All inputs, all outputs.
Poor latency,
high throughput.
Batch
Some inputs, some outputs.
Tunable latency &
throughput.
Stream
Processing
34. The log is a simple idea
Messages are added at the end of the log
Old New
35. Shard data to get scalability
Messages are sent to different partitions
Producer (1) Producer (2) Producer (3)
Cluster of
machines
Partitions live on
different machines
Messages are sent
to different
partitions
38. ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
Stream processing approach comparison
Kafka producer/consumer Kafka Streams KSQL
builder
.stream("input-stream",
Consumed.with(Serdes.String(),
Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(),
Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x
EMIT CHANGES;
39. KSQL Example Use cases
Data exploration Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
40. Example: Retail
KSQL joins the two
streams in real-time
Stream of shipments
that arrive
Stream of purchases from
online and physical stores
Inventory
on hand
41. Example: CDC from DB via Kafka to Elastic
KAFKA
CONNECT
KAFKA
CONNECT
Customers
KSQL processes table
changes in real-time
streams data in streams data out
42. Example: IoT, Automotive, Connected Cars
KAFKA
CONNECT
KSQL joins the two
streams in real-time
Kafka Connect
streams data in
Cars send telemetry
data via Kafka API
Kafka Streams application
to notify customers
Customers
KAFKA
STREAMS
KSQL
43. 43
KSQL for Real-Time
Monitoring CREATE STREAM
syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE
'%Invalid user%';
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
http://cnfl.io/syslogs-filtering
http://cnfl.io/syslog-alerting
44. 44
KSQL for Anomaly
Detection
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5
SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
● Identify patterns or
anomalies in
real-time data,
surfaced in
milliseconds
45. 45
KSQL for Streaming
ETL
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
● Joining, filtering,
and aggregating
streams of event
data
46. {faas}
events as a backbone
appappappapp
Payments Department 2
{faas}appappappapp
Department 3 Department 4
49. Event Streaming Maturity Model
Value
Maturity (Investment & time)
1
2
3
4
5
Pre-Streaming
Developer
Interest
Enterprise
Streaming Pilot /
Early Production
SLA Ready,
Integrated
Streaming
Global
Streaming
Legacy systems.
Batch processes.
Complex / Slow!
LOB Pilot; Small teams
experimenting, with pub/sub
/ integration.
-> 1-3 use cases quickly
moved into Production.
Fragmented.
Multiple mission critical
use cases in production,
with; scale, DR & SLAs.
Streaming clearly
delivering business
value, with C-suite
visibility.
All data in the organization
managed through a single
logical streaming platform.
-> Digital natives / digital pure
players - probably using
Machine Learning & AI
(Relational databases -
redundant)
Pub + Sub Store Process
Data Streaming;
typical maturity stages
Central
Nervous
System
Developer downloads
Kafka & experiments
(15 mins on laptop).
Streaming Platform
managing majority of
mission critical data
processes, globally, with
multi-datacenter
replication across
on-prem and hybrid
clouds.
Projects Platform