The Future of Real-Time Data Integration

The Future of Data Integration:
Data Mesh, and a Special Deep Dive into Stream Processing with
GoldenGate, Apache Kafka and Apache Spark
O R A C L E D E V E L O P M E N T , M A R - 2 0 2 0

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

Today’s Agenda
Strategic IT Forces
GoldenGate Big Data in Brief
Database Change Streams to Apache Kafka
Stream Processing with Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
1
2
3
4

During 2020 / 2021 the
world continues to go
through a Paradigm Shift
into a future where “Cyber-
Physical Systems” are the
new normal.
“Digital Transformation”
requires mindset shift:
1. Sharing data is more
effective than accumulating
2. Decentralizing, distributing,
and copying is more
powerful than stockpiling
3. Connectivity and flow of
data is the starting point for
innovation and socializing.

The Problems of Monolithic Data Architecture
People Process Technology
• Business units have few incentives
work across org boundaries
• Hyper-specialization in tech teams
narrow the focus on technology;
rather than outcomes or solutions
• Pressure on stakeholders to
produce value, but IT orgs still
mostly built like they were 30yrs ago
• 30yr old conception of data flows
• IT-led and technology constrained
• The monolithic data lake is big and
slow by design, not by accident
• “Ingest -> Process -> Serve” design
is same as old “ETL” data flows, it
institutionalizes the wrong goals
• Storage centric conception of data,
but data is Dynamic, not Static
• “the Lake” is conceived as a physical
place where we pile up data (in
Hadoop or on the Cloud)
• Cheaper storage than an EDW but
much worse Governance
• Does nothing to modernize data
architecture itself
Images: https://martinfowler.com/articles/data-monolith-to-mesh.html
5

Evolution towards Real-Time Data Mesh
Industry 3.0: Hub and Spoke Transitional: Kappa Hub Mature: Distributed Kappa
This data pattern, popularized by Ralph
Kimball and Bill Inmon, has been the
foundation for enterprise data
management since 1993.
It is transaction consistent, can scale up
nicely for most use cases, and is based
on SQL, lingua-franca for most tools.
By 2010, the Lambda (big data) pattern
was common. In 2014, Jay Kreps (of
LinkedIn) questioned the Lambda
Architecture and spawned Kappa.
The Kappa principles consider batch
processing as a special case of stream
processing. Use a historized event log
to process both real-time as well as
batch processing.
By 2020, IT infrastructure has
dramatically changed – networking,
containers, cloud, compute, IoT etc
have all pushed data to the edge.
A mature Kappa architecture is not a
single instance “hub” but rather a
distributed mesh of data logs, stream
data processing, change events, and
time series data.
Kappa: https://www.oreilly.com/radar/questioning-the-lambda-architecture/
https://en.wikipedia.org/wiki/Dimensional_modeling
mesh & microservice controls
6
ETL
ETL
ETL
ETL
Lambda: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

Real-Time Data Mesh for Industry 4.0
…from Industry 3.0 …to Industry 4.0
Batch Centric, Schedulers Event Centric, Streams
Mostly Relational Data (aka Views) Polyglot Data (via Logs)
Size for Peak Workloads Elastic, Scale on Demand
Kimball / Inmon Architecture Distributed Kappa
Vendor Specific Open Source Enabled
Simplex Processing is Standard Massively Parallel is Standard
Hubs (EDW, Hadoop, Data Lake) Mesh (Edge, Hybrid, Multi-Cloud)
Governance is “Bolt On” Governance is Embedded
7

Data Mesh Conceptual View – Data Domains
8Copyright © 2020 Oracle and/or its affiliates.
Enterprise Data
Producers:
ERP Apps, DBs,
Middleware etc.
Data Domain
Consumers
People owners of “Data
Products”, collections of
data sets in various
stages of curation
IoT Data
Producers:
Devices & Things
Raw Data
Prepared Data
Canonical Data Data
Domain A
Data
Domain B
f(x)
f(x)
Data
Domain C
Data Mesh
(distributed Kappa, microservices, cloud agnostic)
Domain-Specific Views of Data
Raw Event Consumers
Automated Devices,
Edge Nodes (5G), Scheduled
Routines (eg; ETL etc)
Data Product-Specific
Storage Choices:
• RDBMS
• Data Lake
• Object Store
• Graph, etc.

Raw data, Time Series & Alerting events are pushed
Direct to Database (high fidelity transaction semantics fully preserved)
Consumer-Driven, Event-Centric Data Mesh
Enterprise Data
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
CDC Replication
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Raw
Data
Prepared
Data
Canonical
Data
Raw Data (LCR)
Schema Events
(DDL)
Prepared
Data Topics
“Master”
Data Topics
JSON, XML,
Avro, Parquet,
CSV
Prepared data events are pushed
Canonical data events
Speed &
Fidelity
Trusted
Views
Ease of
Consumption
LCR/TFs
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Data Mesh puts the consumer
needs first – they require data
at different latency, fidelity,
trust levels and views
Data Model
Object Model
System
Of Record
(SoR)
User
Action
App APIs and
system log events
9

Direct to Database (high fidelity transaction semantics fully preserved)
Distributed by Design, Microservices Based
Data Domain
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Data Model
Object Model
System
Of Record
(SoR)
User
Action
CDC Replication
Microservices
Edge
Compute or
Cloud for Raw
Data Events
Prepare
Technical
Data Views
LCRs
Business
Data Views
Raw data, Time Series & Alerting events are pushed
Canonical data
Events
(ephemeral or persisted)
Stream
Process
Events
(persisted)
Stream
Process
Events
(persisted)
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
10

We are creating this today, a Data Integration
Platform for Industry 4.0 – a Real-Time Data
Mesh solution for everyone:
✓ Data Consumer Driven
(low code, browser-based)
✓ Distributed by Design
(multi-cloud, microservices)
✓ Event-Centric Pipelines
(CDC, replication and
streaming)
✓ Immutable Ledger Based
(fully event history of SoRs
aware)
✓ Polyglot Capable
(works with all data types)
✓ ACID Capable
(preserves DB transaction
semantics)
✓ Governed
(multi-level security,
metadata driven)
✓ Enterprise Class
(best of Open Source and
Commercial S/W together)
11

Single Pane of Glass for Real-Time Data Mesh
connect
DB2/z
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Real-Time Stream
Data Processing
Raw
Data
DBAs &
Data Engineers
Data Owners &
Data Products
12
Data Consumer DrivenEvent Centric Pipelines
Deploys in a Mesh
Across Containers, Public Clouds and 5G Edge Devices

How it Works Today: GoldenGate for Big Data
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
BYOS (Bring Your Own Spark)
* distributed, may run on any combination of containers and clouds
13
Data Engineer Data AnalystDBA/GG Ops
Capture Pipeline Analyze DeliverIngest
GoldenGate Microservices Applications Stream Analytics Application
BYOM
(Bring Your
Own
Messaging)
All Data Events
& Transactions

GoldenGate Overall: for the Enterprise
DB2/z
Replication of
Real-time Data
Transactions & Events
GoldenGate Stream Analytics
ETL
&ML
DBMS
Cloud
Big Data
NoSQL
Streams
Object
Storage
Relational
Non-
Relational
Apps
https://www.oracle.com/middleware/technologies/goldengate.html

GoldenGate for Big Data
Compare to:
• Open Source tools like
Sqoop or Kafka Connect
• ETL Tools commercial or
open source
• Changed Data Capture
Tools in niche areas
GoldenGate is:
• Simpler to use via
microservices, cloud etc
• Better Performance on
most DBs, esp Oracle
• More Reliable (in high
availability and disaster
recovery situations)
Real-time Stream of Data
Transactions & Data Store Events
Kafka | Object Store
ElasticSearch | HDFS | etc.
Data Lake
Lowest overhead
High fidelity events
Fastest data visibility
No more batch windows
DML, DDL and Procedures
Consistent recovery point
DB2/z

GG for Big Data – What is Included?
17
• Java Messaging Service (JMS) –
typically covers any JMS
compliant source technology
• Cassandra
• Roadmap sources to include
Apache Kafka (Connect
frameworks) and Mongo DB
• GoldenGate trails from all OGG
Sources like Oracle DB- (MA &
Classic), Microsoft SQL Server,
MySQL, IBM DB2, HPE NonStop
etc.) - Note: the Source side for
relational databases is separately
licensed
• Hadoop (HDFS /Hive, Hbase)
• Kafka - Pub/Sub, Kafka Connect,
REST
• Elasticsearch
• NoSQL ( MongoDB , Cassandra,
Oracle NoSQL)
• Oracle Cloud (Object Storage)
• AWS ( Redshift , Kinesis, S3 )
• Google Cloud ( BigQuery)
• Microsoft Azure Datalake(v1&v2),
Blob Storage
• Flatfiles(Netezza, Greenplum)
• JMS
• JDBC
• Stream Processing for any GG
data feed is included (other data
sources require full use license)
• Low-code development
• ETL (Filter, Aggregate, Merge,
Transform, Load Data)
• Correlate/Enrich
• Alerts, Thresholds, Anomalies
• Business Rules, Data Policies
• Time Series Analysis
• Spatial Analytics, Geo-fence
• Classification, Clustering
• Statistical Inference, Machine
Learning, Regression Models
Sources Targets Streaming

Replication in/out
for Non-Relational
Data Lake Ingest Streaming Ingest Cloud Ingest Messaging Replication NoSQL Replication SaaS Replication
Foundation Patterns:
Database
Replication
Unidirectional Bi-Directional Peer-to-Peer Broadcast Consolidation Distribution
Stream Processing Data Pipelines Data Transformation GoldenGate Integrations Time Series Analysis Geo-Fencing Predictive Analytics
Capabilities Included with GG for Big Data
Copyright © 2020 Oracle and/or its affiliates. 18

OGG for Big Data – Supported Formats
19
• Native formats of targets
• HDFS - Sequence File
• Delimited Text (both Row &
Operation modes)
• JSON (both Row & Operation
modes )
• Avro (both Row & Operation
modes)
• XML
• Parquet
• ORC
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic

Tap into Existing HA Deployments
Existing GG Deployments Add GG for Big Data Deployments
Use existing Extract,
no performance
penalty on Source DB
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Existing
Applications
Data
Services
Analytics

GoldenGate is Transaction Safe
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
GoldenGate semantics are fully ACID / transaction-safe with strong HA
Disaster Recovery
GG may be optionally used as a recovery point
for big data, and GG can supply metadata to
downstream Big Data environments about
transaction/commit boundaries.
ACID Compliant
GoldenGate Pipelines

GoldenGate Coordinated Replicat
Thread 1
Thread 2
Thread ..n
GG Delivery
Single
PRM
GG Big Data Targets
• Unified Parameter File, which is read
by each process thread and
determines the operational
configuration of each thread.
• Each apply thread is independent of
the other apply threads. Each thread
opens the OGG Trail for shared read
operations and has a unique entry in
the OGG Checkpoint Table.
• Although each thread functions
independently, an unrecoverable
error condition on any thread will
cause all threads to terminate in the
ABEND state.
• Full barrier coordination is not
performed on foreign keys. Parent
and child tables must be processed
by the same apply thread.
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic

GoldenGate Big Data with High Availability
GG Capture &
Distribution
GG Big Data Targets
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Use preferred HA
mode, depending
on GG Extract
architecture
GG Big Data
Replicat 001
Clustering Technology
such as Oracle
Clusterware, Veritas,
RedHat etc.
Shared-Disk / Durable Storage (DBFS, ACFS, etc.)
GG Big Data
Replicat 002
Trail
Files
.properti
es
schema
files
checkpoi
nts(.cpj)
/dirsta

For Apache Kafka
Lowest overhead
High fidelity events
Fastest data visibility
No more batch windows
DML, DDL and Procedures
Consistent recovery point
DB2/z
Oracle Streaming
Service (OSS)
Some Stats:
• GoldenGate is moving ~4 Petabytes
of data into Kafka every day
• ~300 customers (G2000) use
GoldenGate with Kafka
• Real-world performance in the 10’s
of thousands of transactions per
second into Kafka

General Problem to Solve
A
B
C
A
B
C
BUSINESS
APPLICATION
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
System of Record
“Data Producer”
Data Sync &
Stream Processing
System to Serve
“Data Consumers”
SQL
Events DB Log
Events Messaging
User
Events

• One Kafka Topic per DB
Table [default setting in
GoldenGate]
• Handling Schema Change
(AKA: Data Drift)
• One Kafka Topic for all
Tables
• Group source data
records into different
Kafka Partitions
• Full supplemental GG
replication
• Partial supplemental,
using DB (Standby) to re-
create full records inside
Kafka
• Partial supplemental, use
Kafka + Cache for Full
Records
Some Patterns to Consider
• DB to DB
(no Kafka or Big Data)
• Mongo or Apache Hive
for some basic ACID
properties
• Kafka, using Exactly Once,
Transactions and
GoldenGate SCN & CSN
metadata
• Mid-tier deployment of
GoldenGate Big Data
• Combined deployment for
Big Data and Database
Targets (from single host)
• Layered Topic Types for
Raw Data, Full Data, and
Canonical Data
Transaction Consistency Table / Topic Mappings Deployment TopologyThe Change Stream

Strongest Transaction Consistency -> Use a DBMS
A
B
C
A
B
C
XY
fact
AC
ABBC
ETL
OLTP ODS OLAP

Transaction Consistency -> With Hive or Mongo
A
B
C
A
B
C
OLTP
A
B
C
https://www.mongodb.com/blog/post
/mongodb-multi-document-acid-
transactions-general-availability
https://community.cloudera.com/t5/Co
mmunity-Articles/Hive-ACID-Merge-by-
Example/ta-p/245402
Apache Hive
(with ACID Merge)
MongoDB
(with ACID Tx)

Transactions -> Use GG to Decorate Kafka Msgs
SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock
increments. The SCN marks a consistent point in time in the database.
CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify
the point in time at which the transaction is committed for maintaining transaction consistency and
data integrity. A CSN is available for all Source DB transactions captured via GoldenGate:
https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html
Kafka
Single Partition
A
A { “customer_id": “1" ,
“first_name": “Debra" ,
“last_name": “Burks" ,
“phone": “" , “email":
“debra.burks@yahoo.com" ,
“SCN”: “130” , “CSN” : “130”
}
B
B
{ “customer_id": “1" , “9273
Thome Ave." , “city":
“Orchard Park" , “state":
“NY" , “zip_code": “14127“ ,
“SCN”: “130” , “CSN” : “130”
}
Data
Consumer is
responsible to
maintain
transaction
boundaries
OLTP
Updates and
Deletes both show
up in Kafka as new
messages,
Consumers must
interpret the flags
correctly

Typical Data Mapping Pattern (Default)
Partition 3
Partition 2
https://docs.oracle.com/en/middleware/goldengate/big-data/19.1/gadbd/using-kafka-
handler.html#GUID-FAD2E590-361E-46CC-B7F4-3BB97E19680E
A
B
C
Partition 1
auto.create.topics.enable property to true. This is the default setting.
A
B
C
OLTP

Handle Schema Change (AKA: Data Drift)
Partition 2
A
Partition 1
A
A
OLTP
A.DdlEvents
DDL
A¹
Alter Table
{add column}
DDL Event
DDL Event
A¹
application/vnd.schemaregistry.v1+json
Any Event
Consumer

Partition 3
Partition 2
Map All Tables to Single Topic
A
B
C
Partition 1
N
OLTP

Partition 2
Natural Keys to Kafka Partitions
A
B
C
Partition 1
X
Z
Eg; partition by Vendor or Client Codes

Kafka:
Full Records
Full Supplemental Logging (Preferred)
A
Use LOGALLSUPCOLS to get the full records, all supplemental columns
A
{ “customer_id": “1" , “first_name": “Debra"
, “last_name": “Burks" , “phone": “" ,
“email": “debra.burks@yahoo.com" ,
“street": “9273 Thome Ave." , “city":
“Orchard Park" , “state": “NY" , “zip_code":
“14127"}
EXTRACT crm
USERIDALIAS ogg
LOGALLSUPCOLS
UPDATERECORDFORMAT COMPACT
EXTTRAIL /gghome/ogg/dirdat/hr
SOURCECATALOG orcl
TABLE crm.customer;
Periodically run
Topic compaction

Partial Supplemental Logging; Join Back to DB
Primary
Read
Standby
Kafka (Raw) Kafka (Full)
Raw-A Full-A
A
changed
columns
only
all
columns
included
A Join Key to full records from Stream Processor
Change data only Full records in sync with SoR
Application
Domain (SoR)
Periodically run
Topic compaction
DB / Block
Replication

Partial Supplemental Logging; Self Join to Kafka
Primary
Kafka (Raw) Kafka (Full)
Raw-A Full-A
A
changed
columns
only
Join to previous full records
using Stream Processor
Change data only Full records in sync with SoR
Application
Domain (SoR)
previous
full record
new full
recordCache:
Key:Offset
Periodically
run Topic
compaction

SAN/RAID Storage
GG Mid-Tier xxx
GG Mid-Tier 002
Example Mid-Tier Deployment Topology
38
GG4BD Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
Kafka Host
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
HTTPS / TLS 1.2
Data Services
(for Apps)
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Data Lake
Data
Warehouse
OGG Replicat <push – data is staged
when events arrive>
<data transformation –
Eg; Stored Procs>
DB2/z

Example Topology with Stream Processing
39
HTTPS / TLS 1.2
Data Services
(for Apps)
GG Mid-Tier xxx
SAN/RAID Storage
GG Mid-Tier 002
GG4BD
Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Kafka Raw Topics
Topic : Table
Topic : Table
Cache
Store
Kafka
Segments
Kafka Prepared TopicsSpark ETL Nodes
OSA Mid-Tier
Topic : Table
Topic : Table
OSA Spark
Application
OSA Web
Application
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL Direct Load to Databases
Data Lake
Database
Data Engineer
(DW / Data Lake
organization)

Pattern: Canonical Objects as Real-Time Events
CDC
Enterprise Data
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
Data Model
Object Model
System
Of Record
(SoR)
User
Action
Raw
Data
Prepared
Data
Canonical
Data
Data Consumers
Applications Data Services
ODS (Data
Store)
Data Marts &
Warehouses
IoT Apps Data Science
Raw data & Alerting events are pushed
Raw Data (LCR)
Schema Events
(DDL)
Prepared
Data Topics
“Master”
Data Topics
Canonical data events
JSON, XML,
Avro, Parquet,
CSV
Data Objects
Table Data
Raw Data / Alerts
ETL is bounded by
Time Window,
lookups can happen
from memory, cache
or via SQL
Direct to Database (relational semantics fully preserved) SQL Consumers
Tradeoff between “Data Fidelity” vs. “Data Latency”

Data In Motion | Stream Processing
42Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2018
Ingest Database Events Select Processing Patterns Build Event Pipelines Serve Data Downstream
Any GoldenGate event is included
free, Kafka native events require
full-use license
Rich set of pre-built patterns can
dramatically improve developer
efficiency and time-to-value
Tool can easily leverage geo-fencing,
machine-learning, and other lookup
data within the data stream
Data can be delivered out to kafka,
databases, or easily staged for
downstream ETL jobs
connect
Data Owners &
Data Products

Significant Intellectual Property
More than 70 patents on stream processing
Mature tech stack for Event Processing
Over 10 years of IP investment
12.2
12.3
18c
19c
11g

Interactive Browser-based Designer
Accessible to Non-Technical Users
• Empower data analysts to enhance data with
no coding skills required
• Intuitive, always-on data view shows results of
transformations as they are defined
• Filter and correlate streams, apply rules,
aggregate, calculate fields etc.
Function extensibility via Java
• Allow data engineers to provide custom
stages and functions to be used by all team
members
Integrated Visualizations
• Explore your business data live through
various tables, charts and geospatial maps

Big Upgrade from Apache Spark
Oracle Stream Analytics Apache Spark (only)
Programming Graphical UX, with ability to cut-paste Scala
directly into Pipelines
Java/Scala low level programming only
Checkpointing Automatic, part of OSA Pipeline
implementation
Developer must be aware of the semantics
and logic
Record-by-Record Automatic Timestamps from OSA CQL Engine Spark Streaming treats all records within a
batch the same
Out of Order Events Automatic, via CQL Timestamps and also via
GG SCN
Not possible to reliably handle
Progression of Time Automatic, CQL engine progresses time (eg;
A not followed by B)
When no new Events, there is no native
progression of time
Windowing Functions Will handle windowing based on number of
Events, Dynamic attributes, other intervals
Micro batch only
Fault Tolerance Automatic, part of OSA application native
behavior
Developer must code

Rich Set of Streaming Patterns
Simplify Access to Complex Algorithms
• Easy-to-use modules with user assistance in the
designer
• Pre-defined visualizations to provide immediate
feedback
• Accessible to data analysts
Comprehensive Library of Patterns
• Covers diverse areas such as anomaly detection,
stream correlation, trend analysis, spatial functions
• Duplicate, out-of-order, and missing event
detection
• Functions for financial, statistic, and log analytic
operations

Location and Geo-Spatial Capabilities
Interactive Spatial Design and Visualization
• Show live location data on maps as events are
processed
• Track individual objects and highlight them based
on different conditions, e.g. Red for violation
Rich Geospatial Pattern Set
• Correlate multiple objects through their spatial
interaction
• Detect speed, and proximity
• Obtain address and city information from location
and vice versa through Geocoding
Scalable Definition of Areas and Geo-Fences
• Define polygons through drawing borders on a map
• Manage large amounts of shapes through spatial
types in Oracle database.

Time Series Analytics
Anatomy of a Time Series Pattern
Built in Patterns for Anomalies
• Pre-defined visualizations to provide
immediate feedback
• Accessible to data analysts
GoldenGate Supplies High Fidelity Events
• Every database commit, logical change
record, schema and procedure event is
visible in the event stream
• Combine with application logs for full picture
Examples
• Banking, credit card transactions, trades…
• Sales and Marketing Data (eCommerce)
• IoT, Telemetry, Devices, Smart Home
• Monitoring data, data centers, networks etc
• Science/medicine, EEG, ECG, DNA
• Social networks, likes, classification, trends

Predictive Analysis and Machine Learning
Real-time Scoring and Decision Making
• Use Machine Learning models to make business
decisions in real-time
• Predict future outcomes such as equipment failures,
customer behavior, fraud and security breaches
• Re-import refined models for improved predictions
Put Data Science in Production
• Import Predictive Models created by data scientists and
engineers in their own environment.
• Import of PMML models for a variety of algorithms such as
vector machines, association rules, Naive Bayes classifier,
clustering models, text models, decision trees, and different
regression models.
• Hide model complexity for use by data analysts
• Custom stages for access to external scoring systems Oracle R
Enterprise
Notebooks
(Jupyter,
Zeppelin, etc)
Data Scientist
Data Analyst/
Data Engineer

Built-in Dashboards
Visualizations Built-In
•Not intended to be a replacement for
purpose-built Data Visualization tools,
•OSA includes some visualizations to
support building graphs on data that is
streaming in-memory (before writing to
Data Store), including:
• Bar Charts
• Line Charts
• Geo-Spatial (Google Maps)
• Area Charts
• Pie Charts
• Scatter Charts
• Bubble Charts
• Thematic Maps

Global Shift in
Technology
(Industry 4.0)
Data
Integration
Shift to Data
Mesh
GoldenGate
for Big Data
51
What have we learned?
Event-Driven,
Distributed Data
Integration *Industrial Strength
& Enterprise Class

This is not a Metamorphosis, it is a Paradigm Shift
The Success Paradox
Data success factors that did well in
Industry 3.0 will not be the factors
that create success in Industry 4.0
Next Gen Data Architecture
ETL Vendors
1990 – 2010’s Gen1 :
• Replication
• Messaging
• Streaming
• Pipelines
Next-Gen has
new DNA not
tied to ETL tools
It is impossible to evolve older Batch
Processing tools into a modern Event-
Centric Stream Processing solution; the
underlying paradigms are fundamentally
different
52

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

The Future of Real-Time Data Integration

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a The Future of Real-Time Data Integration

Semelhante a The Future of Real-Time Data Integration (20)

Mais de Jeffrey T. Pollock

Mais de Jeffrey T. Pollock (19)

Último

Último (20)

The Future of Real-Time Data Integration