The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
The Future of Real-Time Data Integration
1. The Future of Data Integration:
Data Mesh, and a Special Deep Dive into Stream Processing with
GoldenGate, Apache Kafka and Apache Spark
O R A C L E D E V E L O P M E N T , M A R - 2 0 2 0
17. GG for Big Data – What is Included?
17
• Java Messaging Service (JMS) –
typically covers any JMS
compliant source technology
• Cassandra
• Roadmap sources to include
Apache Kafka (Connect
frameworks) and Mongo DB
• GoldenGate trails from all OGG
Sources like Oracle DB- (MA &
Classic), Microsoft SQL Server,
MySQL, IBM DB2, HPE NonStop
etc.) - Note: the Source side for
relational databases is separately
licensed
• Hadoop (HDFS /Hive, Hbase)
• Kafka - Pub/Sub, Kafka Connect,
REST
• Elasticsearch
• NoSQL ( MongoDB , Cassandra,
Oracle NoSQL)
• Oracle Cloud (Object Storage)
• AWS ( Redshift , Kinesis, S3 )
• Google Cloud ( BigQuery)
• Microsoft Azure Datalake(v1&v2),
Blob Storage
• Flatfiles(Netezza, Greenplum)
• JMS
• JDBC
• Stream Processing for any GG
data feed is included (other data
sources require full use license)
• Low-code development
• ETL (Filter, Aggregate, Merge,
Transform, Load Data)
• Correlate/Enrich
• Alerts, Thresholds, Anomalies
• Business Rules, Data Policies
• Time Series Analysis
• Spatial Analytics, Geo-fence
• Classification, Clustering
• Statistical Inference, Machine
Learning, Regression Models
Sources Targets Streaming
19. OGG for Big Data – Supported Formats
19
• Native formats of targets
• HDFS - Sequence File
• Delimited Text (both Row &
Operation modes)
• JSON (both Row & Operation
modes )
• Avro (both Row & Operation
modes)
• XML
• Parquet
• ORC
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
20. Tap into Existing HA Deployments
Existing GG Deployments Add GG for Big Data Deployments
Use existing Extract,
no performance
penalty on Source DB
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Existing
Applications
Data
Services
Analytics
21. GoldenGate is Transaction Safe
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
GoldenGate semantics are fully ACID / transaction-safe with strong HA
Disaster Recovery
GG may be optionally used as a recovery point
for big data, and GG can supply metadata to
downstream Big Data environments about
transaction/commit boundaries.
ACID Compliant
GoldenGate Pipelines
22. GoldenGate Coordinated Replicat
Thread 1
Thread 2
Thread ..n
GG Delivery
Single
PRM
GG Big Data Targets
• Unified Parameter File, which is read
by each process thread and
determines the operational
configuration of each thread.
• Each apply thread is independent of
the other apply threads. Each thread
opens the OGG Trail for shared read
operations and has a unique entry in
the OGG Checkpoint Table.
• Although each thread functions
independently, an unrecoverable
error condition on any thread will
cause all threads to terminate in the
ABEND state.
• Full barrier coordination is not
performed on foreign keys. Parent
and child tables must be processed
by the same apply thread.
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
23. GoldenGate Big Data with High Availability
GG Capture &
Distribution
GG Big Data Targets
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Use preferred HA
mode, depending
on GG Extract
architecture
GG Big Data
Replicat 001
Clustering Technology
such as Oracle
Clusterware, Veritas,
RedHat etc.
Shared-Disk / Durable Storage (DBFS, ACFS, etc.)
GG Big Data
Replicat 002
Trail
Files
.properti
es
schema
files
checkpoi
nts(.cpj)
/dirsta
38. SAN/RAID Storage
GG Mid-Tier xxx
GG Mid-Tier 002
Example Mid-Tier Deployment Topology
38
GG4BD Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
Kafka Host
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
HTTPS / TLS 1.2
Data Services
(for Apps)
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Data Lake
Data
Warehouse
OGG Replicat <push – data is staged
when events arrive>
<data transformation –
Eg; Stored Procs>
DB2/z
39. Example Topology with Stream Processing
39
HTTPS / TLS 1.2
Data Services
(for Apps)
GG Mid-Tier xxx
SAN/RAID Storage
GG Mid-Tier 002
GG4BD
Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Kafka Raw Topics
Topic : Table
Topic : Table
Cache
Store
Kafka
Segments
Kafka Prepared TopicsSpark ETL Nodes
OSA Mid-Tier
Topic : Table
Topic : Table
OSA Spark
Application
OSA Web
Application
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL Direct Load to Databases
Data Lake
Database
Data Engineer
(DW / Data Lake
organization)
51. Global Shift in
Technology
(Industry 4.0)
Data
Integration
Shift to Data
Mesh
GoldenGate
for Big Data
51
What have we learned?
Event-Driven,
Distributed Data
Integration *Industrial Strength
& Enterprise Class