Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Architektur von Big Data Lösungen
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architektur von Big Data
Lösungen
Guido Schmutz (guido.schmutz@trivadis.com)
@gschmutz
2. Guido Schmutz
Working for Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer, Software Architect for Java, SOA & Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
2 Architektur von Big Data Lösungen
3. Agenda
1. Introduction
2. Big Data Reference Architectures
• Traditional Big Data
• Event / Stream-Processing
• Lambda Architecture
• Kappa Architecture
• Unified Architecture
• Microservices Architecture
3. Big Data Ecosystem – many choices sorted!
3 Architektur von Big Data Lösungen
5. Big Data Definition (4 Vs)
+ Time to action ? – Big Data + Real-Time = Stream Processing
Characteristics of Big Data: Its Volume, Velocity
and Variety in combination
Reliable Data Ingestion in Big Data/IoT
6. How to do Big Data? Why is a structuring / architecture
important?
6 Architektur von Big Data Lösungen
7. Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the
right/matching technologies
7 Architektur von Big Data Lösungen
8. Important Properties for choosing (Big) Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…
8 Architektur von Big Data Lösungen
9. Big Data Reference Architectures -
Traditional Big Data
9 Architektur von Big Data Lösungen
10. “Traditional Architecture” for Big Data
Data
Ingestion
(Analytical) Data Processing
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
Channel
10 Architektur von Big Data Lösungen
11. “Traditional Architecture” for Big Data – Hadoop
Technology Mapping
Data
Ingestion
(Analytical) Data Processing
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
Channel
11 Architektur von Big Data Lösungen
12. “Traditional Architecture” for Big Data – Spark
Technology Mapping
Data
Ingestion
(Analytical) Data Processing
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
Channel
12 Architektur von Big Data Lösungen
13. “Traditional Architecture” for Big Data – Feeding in High-
Volume Event Streams
Data
Ingestion
(Analytical) Data Processing
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
Channel
?
?
13 Architektur von Big Data Lösungen
14. Traditional Architecture for Big Data
• Batch Processing - “Data at Rest”
• Not for low latency use cases
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• Spark Core is a faster alternative to Hadoop Map
Reduce, but still Batch Processing
• Spark Ecosystems offers a lot of additional
advanced analytic capabilities (machine learning,
graph processing, …)
14 Architektur von Big Data Lösungen
15. Big Data Reference Architectures –
Event/Stream Processing
15 Architektur von Big Data Lösungen
16. Event / Stream Processing – “Data in Motion”
“Data in motion”
Events are analyzed and processed in real-
time as the arrive
Decisions are timely, contextual and based
on fresh data
Decision latency is eliminated
16 Architektur von Big Data Lösungen
17. Event / Stream Processing Architecture
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Messaging
Result Store
= Data in Motion = Data at Rest
17 Architektur von Big Data Lösungen
18. Challenges for Ingesting Data
Multitude of sensors
Real-Time Streaming
Multiple Firmware versions
Bad Data from damaged sensors
Regulatory Constraints
Data Quality
18 Architektur von Big Data Lösungen
19. Continuous Data Ingestion
DB Source
Big Data
Log
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
IoT Sensor
19
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
19
File Source
Log
Log
Log
Social
Native
Topic
Topic
19 Architektur von Big Data Lösungen
20. Continuous Data Ingestion
DB Source
Big Data
Log
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
IoT Sensor
20
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
20
File Source
Log
Log
Log
Social
Native
Topic
Topic
20 Architektur von Big Data Lösungen
21. Data
Ingestion
(Analytical) Real-Time Data Processing
Event / Stream Processing Architecture – Open Source
Technology Mapping
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Stream/Event Processing
Messaging
Result Store
= Data in Motion = Data at Rest
22 Architektur von Big Data Lösungen
22. Data
Ingestion
(Analytical) Real-Time Data Processing
Event / Stream Processing Architecture – Oracle
Technology Mapping
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Stream/Event Processing
Messaging
Result Store
= Data in Motion = Data at Rest
23 Architektur von Big Data Lösungen
23. Event / Stream Processing Architecture
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better
reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events
were not stored
24 Architektur von Big Data Lösungen
24. Event / Stream Processing Architecture - Keep raw
event data
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Messaging
Result Store
(Analytical) Batch Data Processing
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
25 Architektur von Big Data Lösungen
25. Big Data Reference Architectures -
Lambda Architecture for Big Data
26 Architektur von Big Data Lösungen
26. “Lambda Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
27 Architektur von Big Data Lösungen
27. Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary
=> because the old run had an error or
=> because we have found a better algorithm we want to apply
Have to implement functionality twice
• Once for batch
• Once for real-time streaming
29 Architektur von Big Data Lösungen
28. Big Data Reference Architectures -
„Kappa“ Architecture
30 Architektur von Big Data Lösungen
29. “Kappa Architecture” for Big Data
Data
Ingestion
“Raw Data Reservoir”
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Messaging
Result Store
Raw Data
(Reservoir)
Computed
Information
= Data in Motion = Data at Rest
31 Architektur von Big Data Lösungen
Queryable State
30. Organizing NoSQL Data Stores – Different Types
Key Value Store
Wide-column store
Document store
Graph store
Key Value
K1 V1
K2 V2
K3 V3
Document
{
k1: v1,
k2: v2,
k3: [v1, v2, v3]
}
Rowkey
CK1
RK1
V1
CK2
V2
CK3
V3
CK4
V4
…
…
CK1
RK2
V1
CK4
V4
CK6
V6
…
…
…
…
…
…
CK3
V3
32 Architektur von Big Data Lösungen
31. Organizing NoSQL Data Stores – and the Products
Key Value Store
Wide-column store
Document store
Graph store
33 Architektur von Big Data Lösungen
32. Big Data Reference Architectures -
„Unified“ Architecture
34 Architektur von Big Data Lösungen
33. “Unified Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing (Calculate Models of incoming data)
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Messaging
Result Store
Result Store
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Prediction
Models
35 Architektur von Big Data Lösungen
Queryable State
35. MicroserviceMicroservice
MicroserviceMicroservice
Event-Driven (Micro-) Services Architecture
Data
Ingestion
“Raw Data Reservoir”
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Microservice 2
Service
Raw Data
(Reservoir)
Computed
Information
= Data in Motion = Data at Rest
37 Architektur von Big Data Lösungen
State
Batch
compute
Microservice 1
Service State
API
Result Store
36. Big Data Ecosystem – many
choices sorted!
38 Architektur von Big Data Lösungen
37. Building Blocks for (Big) Data Processing
Data
Acquisition
Format
File System
Stream Processing
Batch SQL
Graph DBMS
Document
DBMS
Relational
DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query
Federation
Table-Style
DBMS
Key Value
DBMS
Batch Processing
In-Memory
39 Architektur von Big Data Lösungen
38. Big Data Ecosystem – many choices sorted!
40 Architektur von Big Data Lösungen